Skip to content

Conversation

@ffranr
Copy link
Contributor

@ffranr ffranr commented Dec 12, 2025

Fix for #1909


  • Emit explicit address import success/failure events from the custodian, have NewAddr RPC subscribe and wait for the outcome with a timeout, and ignore unsupported events in the RPC stream so import errors don’t crash the daemon.
  • Make mailbox connection retries configurable (MaxConnectAttempts) and treat backoff NumTries as optional with a default, reducing the chance of runaway retries while still surfacing failures to callers.
  • Add an itest that creates a V2 address against an unreachable authmailbox (with the upfront connect check skipped) to verify tapd stays up after the subscription failure.

Refactor to make it explicit that every error from the main event loop
is currently treated as critical and results in a TAPD shutdown. This
change does not alter the existing behavior but makes the criticality of
error handling more obvious, paving the way to relax this in the future.
Introduce `AddrImportErrEvent` to notify subscribers of address import
errors without shutting down the main event loop. Also, ensure
`AssetReceiveEvent` and `AddrImportErrEvent` implement the `Event`
interface.
Introduce `AddrImportCompleteEvent` to notify subscribers when an
address is successfully imported. Ensure the event implements the
`Event` interface and integrate it with the main event loop.
Modify SubscribeReceiveEvents such that it silently ignores the newly
added `tapgarden.Custodian` event types.
@ffranr ffranr added this to the v0.8 milestone Dec 12, 2025
@ffranr ffranr self-assigned this Dec 12, 2025
@ffranr ffranr added the bug fix label Dec 12, 2025
@ffranr ffranr added the addrs label Dec 12, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @ffranr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the resilience of the tapd daemon by improving its error handling and retry mechanisms when interacting with the authmailbox service. The changes ensure that tapd can gracefully handle authmailbox subscription failures and other related issues without crashing, providing a more stable and robust user experience. It introduces a more controlled event-driven flow for address imports and makes connection retry logic more configurable.

Highlights

  • Daemon Stability: The tapd daemon will now remain running even if the authmailbox subscription fails, preventing crashes related to external service connectivity issues.
  • Address Import Events: Explicit success and failure events for address imports are now emitted by the custodian, allowing the NewAddr RPC to subscribe and wait for the outcome with a timeout.
  • Configurable Mailbox Retries: Mailbox connection retries are now configurable via MaxConnectAttempts, and the NumTries for backoff is treated as optional with a default, reducing the chance of excessive retries.
  • Robust RPC Stream Handling: Unsupported events in the RPC stream are now ignored instead of causing the daemon to crash, improving overall RPC robustness.
  • New Integration Test: An integration test has been added to specifically verify that tapd remains operational when a V2 address is created against an unreachable authmailbox.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements to make tapd more resilient, particularly when dealing with authmailbox subscription failures. The main changes include making the NewAddr RPC synchronous by waiting for an address import confirmation, preventing crashes on unknown events, and making connection retries configurable. The changes are well-structured and address the reported issue effectively. The new integration test is a great addition to verify the fix. My feedback is minor and focuses on improving code clarity and maintainability.

@coveralls
Copy link

coveralls commented Dec 12, 2025

Pull Request Test Coverage Report for Build 20236451398

Details

  • 140 of 189 (74.07%) changed or added relevant lines in 6 files are covered.
  • 89 unchanged lines in 17 files lost coverage.
  • Overall coverage increased (+0.02%) to 56.78%

Changes Missing Coverage Covered Lines Changed/Added Lines %
proof/courier.go 2 4 50.0%
tapcfg/config.go 9 13 69.23%
rpcserver.go 26 43 60.47%
tapgarden/custodian.go 92 118 77.97%
Files with Coverage Reduction New Missed Lines %
commitment/tap.go 2 85.19%
fn/iter.go 2 62.07%
tapdb/sqlc/mssmt.sql.go 2 48.34%
tapdb/universe_federation.go 2 88.96%
universe_rpc_diff.go 2 76.0%
tapdb/interfaces.go 3 80.0%
tapgarden/planter.go 3 80.26%
universe/syncer.go 4 85.93%
tapchannel/aux_leaf_signer.go 5 43.18%
mssmt/compacted_tree.go 6 78.11%
Totals Coverage Status
Change from base Build 20236005009: 0.02%
Covered Lines: 65490
Relevant Lines: 115340

💛 - Coveralls

- Change BackoffCfg.NumTries to uint32 with clearer docs; zero now falls
  back to DefaultProofTransferNumTries.
- Apply the default when wiring courier configs and reflect the
  semantics in sample-tapd.conf
- Update tests to use uint32 counters and add a custodian helper to wait
  for address import events.

This change will be useful later for authmailbox connection attempts.
Introduce `MaxConnectAttempts` to allow configurable retry limits for
mailbox connection attempts. Replace hardcoded retry limits with this
new parameter and update related configurations and tests accordingly.
- Register a custodian subscriber before creating the address to avoid
  racing the import event.
- Block until the custodian reports success/error (with timeout
 fallback) and surface failures.
- Clean up the subscription after handling the import outcome.
Add an integration test that exercises V2 address creation with an
unreachable mailbox courier and a skipped upfront connection check. The
test verifies that the resulting custodian subscription failure is
handled without causing tapd to crash or shut down.
@ffranr ffranr force-pushed the wip/tapgarden-custodian/fix-errchan-handling branch from ad457a6 to 2ff1a5e Compare December 15, 2025 14:42
Copy link
Contributor

@darioAnongba darioAnongba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only nits.

"nonexistent.invalid:65500",
)

t.Logf("Trying to create an address with bad mailbox %s", badCourier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no logging in the itests 🙏 . The test purpose and its assertions are the only thing we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no logging? Sometime I find log messages helpful when debugging a failing itest.


_, err = tapd.GetInfo(infoCtx, &taprpc.GetInfoRequest{})
require.NoError(t.t, err)
t.Logf("Tapd is still up and running")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark, this is what the test does so logging is useless if the test passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

4 participants