-
-
Notifications
You must be signed in to change notification settings - Fork 524
Open
Description
Description
When running skill-seekers scrape on Windows (Chinese locale), the tool fails with a GBK codec error because several file operations don't specify encoding='utf-8'.
Error Message
Error: 'gbk' codec can't decode byte 0xac in position 206: illegal multibyte sequence
Environment
- OS: Windows 10/11 (Chinese locale)
- Python: 3.14
- skill-seekers: 2.1.1
Root Cause
Windows Chinese edition uses GBK as the default encoding. The following file operations in doc_scraper.py don't specify UTF-8 encoding:
load_config()function (~line 1390):
with open(config_path, 'r') as f: # Missing encoding='utf-8'
check_existing_data() function (~line 1474):
with open(f"{data_dir}/summary.json", 'r') as f: # Missing encoding='utf-8'
Suggested Fix
Add encoding='utf-8' to all file open operations:
# load_config()
with open(config_path, 'r', encoding='utf-8') as f:
config = json.load(f)
# check_existing_data()
with open(f"{data_dir}/summary.json", 'r', encoding='utf-8') as f:
summary = json.load(f)
Additional Notes
This issue affects all Windows users with non-English locales (Chinese, Japanese, Korean, etc.) where the system default encoding is not UTF-8.Metadata
Metadata
Assignees
Labels
No labels