From 924d736cfba309a7147a4f105c98ae963da044bb Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Feb 2026 23:18:02 +0000 Subject: [PATCH 1/4] Initial plan From 3b24e1512b3ee01672807702ec83ee6b07540a6f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Feb 2026 23:20:22 +0000 Subject: [PATCH 2/4] Add Issue Duplication Detector documentation Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --- README.md | 1 + docs/issue-duplication-detector.md | 100 +++++++++++++++++++++++ workflows/issue-duplication-detector.md | 102 ++++++++++++++++++++++++ 3 files changed, 203 insertions(+) create mode 100644 docs/issue-duplication-detector.md create mode 100644 workflows/issue-duplication-detector.md diff --git a/README.md b/README.md index 86e8ee7..07af881 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com ### Depth Triage & Analysis Workflows - [🏷️ Issue Triage](docs/issue-triage.md) - Triage issues and pull requests +- [🔍 Issue Duplication Detector](docs/issue-duplication-detector.md) - Detect duplicate issues and suggest next steps - [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically - [🔍 Repo Ask](docs/repo-ask.md) - Intelligent research assistant for repository questions and analysis - [🔍 Daily Accessibility Review](docs/daily-accessibility-review.md) - Review application accessibility by automatically running and using the application diff --git a/docs/issue-duplication-detector.md b/docs/issue-duplication-detector.md new file mode 100644 index 0000000..0dbabc5 --- /dev/null +++ b/docs/issue-duplication-detector.md @@ -0,0 +1,100 @@ +# 🔍 Issue Duplication Detector + +> For an overview of all available workflows, see the [main README](../README.md). + +The [issue duplication detector workflow](../workflows/issue-duplication-detector.md?plain=1) runs every 5 minutes to detect duplicate issues in the repository and suggest next steps. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the Issue Duplication Detector workflow to your repository +gh aw add githubnext/agentics/issue-duplication-detector +``` + +This walks you through adding the workflow to your repository. + +You must also [choose a coding agent](https://github.github.com/gh-aw/reference/engines/) and add an API key secret for the agent to your repository. + +You can manually trigger this workflow using `gh aw run issue-duplication-detector` or wait for it to run automatically on its 5-minute schedule. + +**Mandatory Checklist** + +* [ ] If in a fork, enable GitHub Actions and Issues in the fork settings + +## Configuration + +This workflow requires no configuration and works out of the box. The workflow uses intelligent semantic analysis to detect duplicate issues by comparing titles, descriptions, and content. + +### How It Works + +The workflow operates on a 5-minute batch schedule: + +1. **Searches for recent issues**: Queries for issues created or updated in the last 10 minutes +2. **Analyzes each issue**: Extracts key information from the issue title and body +3. **Searches for duplicates**: Uses GitHub search with keywords to find similar existing issues +4. **Compares semantically**: Analyzes whether issues describe the same underlying problem or request +5. **Posts helpful comments**: If duplicates are found, adds a polite comment with: + - Links to potential duplicate issues + - Explanation of why they appear to be duplicates + - Suggested next steps for the issue author + +### Batch Processing & Cost Control + +- Runs every 5 minutes to batch-process multiple issues in a single workflow run +- Only comments when high-confidence duplicates are found +- Maximum 10 comments per run to prevent excessive API usage +- 15-minute timeout ensures predictable runtime costs + +After editing run `gh aw compile` to update the workflow and commit all changes to the default branch. + +## What it reads from GitHub + +- Recently created or updated issues (last 10 minutes) +- Full issue details including title, body, and metadata +- Repository issue history for duplicate detection +- Both open and closed issues for comprehensive analysis + +## What it creates + +- Adds comments to issues that appear to be duplicates +- Comments include links to potential duplicates and suggested next steps +- Requires `issues: write` permission + +## What web searches it performs + +- Does not perform web searches; operates entirely within GitHub data + +## Human in the loop + +- Review duplicate detection comments for accuracy +- Verify that flagged issues are truly duplicates +- Close duplicate issues or provide clarification if the detection was incorrect +- Add any missing context to the original issue if the duplicate has valuable additional information +- Monitor false positives and disable the workflow if accuracy is not acceptable + +## Activity duration + +- By default this workflow will trigger for at most 30 days, after which it will stop triggering. +- This allows you to experiment with the workflow for a limited time before deciding whether to keep it active. + +## Example Output + +When a duplicate is detected, the workflow posts a comment like: + +```markdown +👋 Hi! It looks like this issue might be a duplicate of existing issue(s): + +- #123 - Add support for custom templates + +Both issues describe the need for customizable templates in the project configuration. + +**Suggested next steps:** +- Review issue #123 to see if it addresses your concern +- If this issue has additional context not covered in #123, consider adding it there +- If they are indeed the same, this issue can be closed as a duplicate + +Let us know if you think this assessment is incorrect! +``` diff --git a/workflows/issue-duplication-detector.md b/workflows/issue-duplication-detector.md new file mode 100644 index 0000000..39bb4de --- /dev/null +++ b/workflows/issue-duplication-detector.md @@ -0,0 +1,102 @@ +--- +description: Detect duplicate issues and suggest next steps (batched every 5 minutes) +on: + schedule: + - cron: "*/5 * * * *" # Every 5 minutes + workflow_dispatch: + +permissions: read-all + +tools: + github: + toolsets: [default] + bash: + - "*" + +safe-outputs: + add-comment: + max: 10 # Allow multiple comments in batch mode + +timeout-minutes: 15 +--- + +# Issue Duplication Detector + +You are an AI agent that detects duplicate issues in the repository `${{ github.repository }}`. + +## Your Task + +Analyze recently created or updated issues to determine if they are duplicates of existing issues. This workflow runs every 5 minutes to batch-process issues, providing cost control and natural request batching. + +## Instructions + +1. **Find recent issues to check**: + - Use GitHub tools to search for issues in this repository that were created or updated in the last 10 minutes + - Query: `repo:${{ github.repository }} is:issue updated:>=$(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ)` + - This captures any issues that might have been created or edited since the last run + - If no recent issues are found, exit successfully without further action + +2. **For each recent issue found**: + - Fetch the full issue details using GitHub tools + - Note the issue number, title, and body content + +3. **Search for duplicate issues**: + - For each recent issue, use GitHub tools to search for similar existing issues + - Search using keywords from the issue's title and body + - Look for issues that describe the same problem, feature request, or topic + - Consider both open and closed issues (closed issues might have been resolved) + - Focus on semantic similarity, not just exact keyword matches + - Exclude the current issue itself from the duplicate search + +4. **Analyze and compare**: + - Review the content of potentially duplicate issues + - Determine if they are truly duplicates or just similar topics + - A duplicate means the same underlying problem, request, or discussion + - Consider that different wording might describe the same issue + +5. **For issues with duplicates found**: + - Use the `output.add-comment` safe output to post a comment on the issue + - In your comment: + - Politely inform that this appears to be a duplicate + - List the duplicate issue(s) with their numbers and titles using markdown links (e.g., "This appears to be a duplicate of #123") + - Provide a brief explanation of why they are duplicates + - Suggest next steps, such as: + - Reviewing the existing issue(s) to see if they already address the concern + - Adding any new information to the existing issue if this one has additional context + - Closing this issue as a duplicate if appropriate + - Keep the tone helpful and constructive + +6. **For issues with no duplicates**: + - Do not add any comment + - The issue is unique and can proceed normally + +## Important Guidelines + +- **Batch processing**: Process multiple issues in a single run when available +- **Read-only analysis**: You are only analyzing and commenting, not modifying issues +- **Be thorough**: Search comprehensively to avoid false negatives +- **Be accurate**: Only flag clear duplicates to avoid false positives +- **Be helpful**: Provide clear reasoning and actionable suggestions +- **Use safe-outputs**: Always use `output.add-comment` for commenting, never try to use GitHub write APIs directly +- **Cost control**: The 5-minute batching window provides a natural upper bound on costs + +## Example Comment Format + +When you find duplicates, structure your comment like this: + +```markdown +👋 Hi! It looks like this issue might be a duplicate of existing issue(s): + +- #123 - [Title of duplicate issue] + +Both issues describe [brief explanation of the common problem/request]. + +**Suggested next steps:** +- Review issue #123 to see if it addresses your concern +- If this issue has additional context not covered in #123, consider adding it there +- If they are indeed the same, this issue can be closed as a duplicate + +Let us know if you think this assessment is incorrect! +``` + +Remember: Only comment if you have high confidence that duplicates exist. From e0d80d64888c0c89b625fa67a67a41713739e233 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Feb 2026 23:22:06 +0000 Subject: [PATCH 3/4] Clarify timestamp format in workflow instructions Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --- workflows/issue-duplication-detector.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/workflows/issue-duplication-detector.md b/workflows/issue-duplication-detector.md index 39bb4de..5e2ff5c 100644 --- a/workflows/issue-duplication-detector.md +++ b/workflows/issue-duplication-detector.md @@ -32,7 +32,8 @@ Analyze recently created or updated issues to determine if they are duplicates o 1. **Find recent issues to check**: - Use GitHub tools to search for issues in this repository that were created or updated in the last 10 minutes - - Query: `repo:${{ github.repository }} is:issue updated:>=$(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ)` + - Construct a query like: `repo:${{ github.repository }} is:issue updated:>=` + - Where the timestamp should be in ISO 8601 format (e.g., 2024-02-04T23:08:00Z) - This captures any issues that might have been created or edited since the last run - If no recent issues are found, exit successfully without further action From f2c04d0a54d091304e75975ccfca58ff5c332ae7 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 4 Feb 2026 23:27:33 +0000 Subject: [PATCH 4/4] Change workflow schedule from every 5 minutes to every 6 hours Co-authored-by: Mossaka <5447827+Mossaka@users.noreply.github.com> --- docs/issue-duplication-detector.md | 12 ++++++------ workflows/issue-duplication-detector.md | 14 +++++++------- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/issue-duplication-detector.md b/docs/issue-duplication-detector.md index 0dbabc5..3c021b8 100644 --- a/docs/issue-duplication-detector.md +++ b/docs/issue-duplication-detector.md @@ -2,7 +2,7 @@ > For an overview of all available workflows, see the [main README](../README.md). -The [issue duplication detector workflow](../workflows/issue-duplication-detector.md?plain=1) runs every 5 minutes to detect duplicate issues in the repository and suggest next steps. +The [issue duplication detector workflow](../workflows/issue-duplication-detector.md?plain=1) runs every 6 hours to detect duplicate issues in the repository and suggest next steps. ## Installation @@ -18,7 +18,7 @@ This walks you through adding the workflow to your repository. You must also [choose a coding agent](https://github.github.com/gh-aw/reference/engines/) and add an API key secret for the agent to your repository. -You can manually trigger this workflow using `gh aw run issue-duplication-detector` or wait for it to run automatically on its 5-minute schedule. +You can manually trigger this workflow using `gh aw run issue-duplication-detector` or wait for it to run automatically on its 6-hour schedule. **Mandatory Checklist** @@ -30,9 +30,9 @@ This workflow requires no configuration and works out of the box. The workflow u ### How It Works -The workflow operates on a 5-minute batch schedule: +The workflow operates on a 6-hour batch schedule: -1. **Searches for recent issues**: Queries for issues created or updated in the last 10 minutes +1. **Searches for recent issues**: Queries for issues created or updated in the last 6 hours 2. **Analyzes each issue**: Extracts key information from the issue title and body 3. **Searches for duplicates**: Uses GitHub search with keywords to find similar existing issues 4. **Compares semantically**: Analyzes whether issues describe the same underlying problem or request @@ -43,7 +43,7 @@ The workflow operates on a 5-minute batch schedule: ### Batch Processing & Cost Control -- Runs every 5 minutes to batch-process multiple issues in a single workflow run +- Runs every 6 hours to batch-process multiple issues in a single workflow run - Only comments when high-confidence duplicates are found - Maximum 10 comments per run to prevent excessive API usage - 15-minute timeout ensures predictable runtime costs @@ -52,7 +52,7 @@ After editing run `gh aw compile` to update the workflow and commit all changes ## What it reads from GitHub -- Recently created or updated issues (last 10 minutes) +- Recently created or updated issues (last 6 hours) - Full issue details including title, body, and metadata - Repository issue history for duplicate detection - Both open and closed issues for comprehensive analysis diff --git a/workflows/issue-duplication-detector.md b/workflows/issue-duplication-detector.md index 5e2ff5c..0183359 100644 --- a/workflows/issue-duplication-detector.md +++ b/workflows/issue-duplication-detector.md @@ -1,8 +1,8 @@ --- -description: Detect duplicate issues and suggest next steps (batched every 5 minutes) +description: Detect duplicate issues and suggest next steps (batched every 6 hours) on: schedule: - - cron: "*/5 * * * *" # Every 5 minutes + - cron: "0 */6 * * *" # Every 6 hours workflow_dispatch: permissions: read-all @@ -26,14 +26,14 @@ You are an AI agent that detects duplicate issues in the repository `${{ github. ## Your Task -Analyze recently created or updated issues to determine if they are duplicates of existing issues. This workflow runs every 5 minutes to batch-process issues, providing cost control and natural request batching. +Analyze recently created or updated issues to determine if they are duplicates of existing issues. This workflow runs every 6 hours to batch-process issues, providing cost control and natural request batching. ## Instructions 1. **Find recent issues to check**: - - Use GitHub tools to search for issues in this repository that were created or updated in the last 10 minutes - - Construct a query like: `repo:${{ github.repository }} is:issue updated:>=` - - Where the timestamp should be in ISO 8601 format (e.g., 2024-02-04T23:08:00Z) + - Use GitHub tools to search for issues in this repository that were created or updated in the last 6 hours + - Construct a query like: `repo:${{ github.repository }} is:issue updated:>=` + - Where the timestamp should be in ISO 8601 format (e.g., 2024-02-04T17:00:00Z) - This captures any issues that might have been created or edited since the last run - If no recent issues are found, exit successfully without further action @@ -79,7 +79,7 @@ Analyze recently created or updated issues to determine if they are duplicates o - **Be accurate**: Only flag clear duplicates to avoid false positives - **Be helpful**: Provide clear reasoning and actionable suggestions - **Use safe-outputs**: Always use `output.add-comment` for commenting, never try to use GitHub write APIs directly -- **Cost control**: The 5-minute batching window provides a natural upper bound on costs +- **Cost control**: The 6-hour batching window provides a natural upper bound on costs ## Example Comment Format