chore: revert README to previous version #4

tangwz · 2025-08-15T08:03:38Z

Summary

roll back README.md to its prior content

Testing

python scripts/update_readme.py (fails: RuntimeError: failed to fetch https://mysql.taobao.org/monthly/: <urlopen error [Errno 101] Network is unreachable>)

https://chatgpt.com/codex/tasks/task_e_689ed80fd5808330b5cd142dc02a4729

Copilot

Pull Request Overview

This pull request adds a new Python script that fetches latest posts from MySQL Taobao monthly archives and updates the README.md file with new database-related articles. The script includes functionality to parse HTML content, categorize articles, and append new entries to the README in markdown table format.

Adds a complete web scraping script for MySQL monthly archives
Implements HTML parsing to extract articles by database categories
Provides README.md update functionality with new article entries

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-15T08:05:03Z

scripts/update_readme.py

+        handler = urllib.request.ProxyHandler({})  # disable env proxies
+    opener = urllib.request.build_opener(handler)
+    try:
+        with opener.open(req, timeout=10) as resp:  # pragma: no cover - network I/O


The hardcoded timeout value of 10 seconds should be configurable. Consider adding a TIMEOUT environment variable or making it a module-level constant to allow customization for different network conditions.

Suggested change

with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O

with opener.open(req, timeout=TIMEOUT) as resp: # pragma: no cover - network I/O

Copilot · 2025-08-15T08:05:04Z

scripts/update_readme.py

+            # skip categories not present
+            continue
+        lines = [f"| {typ} | [{title}]({link}) |" for typ, title, link in items]
+        content = content.replace(table_header, table_header + "\n" + "\n".join(lines))


Using string replace for content insertion is fragile and could cause incorrect replacements if the table header appears multiple times in the README. Consider using regex with word boundaries or a more precise matching approach to ensure only the intended table is updated.

Suggested change

content = content.replace(table_header, table_header + "\n" + "\n".join(lines))

# Use regex to match the table header and following table rows

pattern = re.compile(

re.escape(table_header) + r"(?:\n(?:\|.*\|))*",

re.MULTILINE

)

replacement = table_header + "\n" + "\n".join(lines)

content, count = pattern.subn(replacement, content, count=1)

if count == 0:

# fallback: skip if not matched (should not happen due to earlier check)

continue

Copilot · 2025-08-15T08:05:04Z

scripts/update_readme.py

+    updates = defaultdict(list)
+    for y, m in months:
+        url = f"{BASE_URL}/{y:04d}/{m:02d}/"
+        if url in existing:


Using string containment check on the entire README content for each URL is inefficient. Consider extracting existing URLs once at the beginning using regex and storing them in a set for O(1) lookup instead of O(n) string searches.

Suggested change

if url in existing:

# Extract all URLs from the README content and store in a set for O(1) lookup

existing_urls = set(re.findall(r"https?://[^\s\)]+", existing))

updates = defaultdict(list)

for y, m in months:

url = f"{BASE_URL}/{y:04d}/{m:02d}/"

if url in existing_urls:

chore: revert README to previous version

94a74e1

tangwz added the codex label Aug 15, 2025 — with ChatGPT Codex Connector

tangwz requested a review from Copilot August 15, 2025 08:04

Copilot AI reviewed Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: revert README to previous version #4

chore: revert README to previous version #4

tangwz commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 15, 2025

Uh oh!

Copilot AI Aug 15, 2025

Uh oh!

Copilot AI Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	with opener.open(req, timeout=10) as resp: # pragma: no cover - network I/O
	with opener.open(req, timeout=TIMEOUT) as resp: # pragma: no cover - network I/O

-        content = content.replace(table_header, table_header + "\n" + "\n".join(lines))
+        # Use regex to match the table header and following table rows
+        pattern = re.compile(
+            re.escape(table_header) + r"(?:\n(?:\|.*\|))*",
+            re.MULTILINE
+        )
+        replacement = table_header + "\n" + "\n".join(lines)
+        content, count = pattern.subn(replacement, content, count=1)
+        if count == 0:
+            # fallback: skip if not matched (should not happen due to earlier check)
+            continue

chore: revert README to previous version #4

Are you sure you want to change the base?

chore: revert README to previous version #4

Conversation

tangwz commented Aug 15, 2025

Summary

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants