-
Notifications
You must be signed in to change notification settings - Fork 117
chore: revert README to previous version #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds a new Python script that fetches latest posts from MySQL Taobao monthly archives and updates the README.md file with new database-related articles. The script includes functionality to parse HTML content, categorize articles, and append new entries to the README in markdown table format.
- Adds a complete web scraping script for MySQL monthly archives
- Implements HTML parsing to extract articles by database categories
- Provides README.md update functionality with new article entries
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| handler = urllib.request.ProxyHandler({}) # disable env proxies | ||
| opener = urllib.request.build_opener(handler) | ||
| try: | ||
| with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O |
Copilot
AI
Aug 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded timeout value of 10 seconds should be configurable. Consider adding a TIMEOUT environment variable or making it a module-level constant to allow customization for different network conditions.
| with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O | |
| with opener.open(req, timeout=TIMEOUT) as resp: # pragma: no cover - network I/O |
| # skip categories not present | ||
| continue | ||
| lines = [f"| {typ} | [{title}]({link}) |" for typ, title, link in items] | ||
| content = content.replace(table_header, table_header + "\n" + "\n".join(lines)) |
Copilot
AI
Aug 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string replace for content insertion is fragile and could cause incorrect replacements if the table header appears multiple times in the README. Consider using regex with word boundaries or a more precise matching approach to ensure only the intended table is updated.
| content = content.replace(table_header, table_header + "\n" + "\n".join(lines)) | |
| # Use regex to match the table header and following table rows | |
| pattern = re.compile( | |
| re.escape(table_header) + r"(?:\n(?:\|.*\|))*", | |
| re.MULTILINE | |
| ) | |
| replacement = table_header + "\n" + "\n".join(lines) | |
| content, count = pattern.subn(replacement, content, count=1) | |
| if count == 0: | |
| # fallback: skip if not matched (should not happen due to earlier check) | |
| continue |
| updates = defaultdict(list) | ||
| for y, m in months: | ||
| url = f"{BASE_URL}/{y:04d}/{m:02d}/" | ||
| if url in existing: |
Copilot
AI
Aug 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string containment check on the entire README content for each URL is inefficient. Consider extracting existing URLs once at the beginning using regex and storing them in a set for O(1) lookup instead of O(n) string searches.
| if url in existing: | |
| # Extract all URLs from the README content and store in a set for O(1) lookup | |
| existing_urls = set(re.findall(r"https?://[^\s\)]+", existing)) | |
| updates = defaultdict(list) | |
| for y, m in months: | |
| url = f"{BASE_URL}/{y:04d}/{m:02d}/" | |
| if url in existing_urls: |
Summary
Testing
python scripts/update_readme.py(fails: RuntimeError: failed to fetch https://mysql.taobao.org/monthly/: <urlopen error [Errno 101] Network is unreachable>)https://chatgpt.com/codex/tasks/task_e_689ed80fd5808330b5cd142dc02a4729