Skip to content

Conversation

@tangwz
Copy link
Owner

@tangwz tangwz commented Aug 15, 2025

Summary

  • roll back README.md to its prior content

Testing


https://chatgpt.com/codex/tasks/task_e_689ed80fd5808330b5cd142dc02a4729

@tangwz tangwz requested a review from Copilot August 15, 2025 08:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a new Python script that fetches latest posts from MySQL Taobao monthly archives and updates the README.md file with new database-related articles. The script includes functionality to parse HTML content, categorize articles, and append new entries to the README in markdown table format.

  • Adds a complete web scraping script for MySQL monthly archives
  • Implements HTML parsing to extract articles by database categories
  • Provides README.md update functionality with new article entries

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

handler = urllib.request.ProxyHandler({}) # disable env proxies
opener = urllib.request.build_opener(handler)
try:
with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded timeout value of 10 seconds should be configurable. Consider adding a TIMEOUT environment variable or making it a module-level constant to allow customization for different network conditions.

Suggested change
with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O
with opener.open(req, timeout=TIMEOUT) as resp: # pragma: no cover - network I/O

Copilot uses AI. Check for mistakes.
# skip categories not present
continue
lines = [f"| {typ} | [{title}]({link}) |" for typ, title, link in items]
content = content.replace(table_header, table_header + "\n" + "\n".join(lines))
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string replace for content insertion is fragile and could cause incorrect replacements if the table header appears multiple times in the README. Consider using regex with word boundaries or a more precise matching approach to ensure only the intended table is updated.

Suggested change
content = content.replace(table_header, table_header + "\n" + "\n".join(lines))
# Use regex to match the table header and following table rows
pattern = re.compile(
re.escape(table_header) + r"(?:\n(?:\|.*\|))*",
re.MULTILINE
)
replacement = table_header + "\n" + "\n".join(lines)
content, count = pattern.subn(replacement, content, count=1)
if count == 0:
# fallback: skip if not matched (should not happen due to earlier check)
continue

Copilot uses AI. Check for mistakes.
updates = defaultdict(list)
for y, m in months:
url = f"{BASE_URL}/{y:04d}/{m:02d}/"
if url in existing:
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string containment check on the entire README content for each URL is inefficient. Consider extracting existing URLs once at the beginning using regex and storing them in a set for O(1) lookup instead of O(n) string searches.

Suggested change
if url in existing:
# Extract all URLs from the README content and store in a set for O(1) lookup
existing_urls = set(re.findall(r"https?://[^\s\)]+", existing))
updates = defaultdict(list)
for y, m in months:
url = f"{BASE_URL}/{y:04d}/{m:02d}/"
if url in existing_urls:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants