Skip to content

Conversation

@thmzlt
Copy link
Contributor

@thmzlt thmzlt commented Mar 24, 2022

This adds a new Prometheus gauge to keep track of the error code for each price account and publisher. We use the empty string error code to communicate that the publisher/symbol pair is in a healthy status. There is a separate time series for each error code so we can have error-specific alerts. We can combine multiple series using queries.

@thmzlt thmzlt requested a review from jayantk March 24, 2022 13:40
observer.py Outdated
if not price_errors and not price_account_errors:
prom_publisher_status.labels(
symbol=symbol,
publisher=publisher_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way the errors work is that they have a publisher_key field that can be None. If the key is None, it's an error for the symbol as a whole, not a particular publisher. I think setting publisher=publisher_key here is not the right thing to do.

I also suggest moving all of the error logging code outside the for loop, so it's adjacent to the notify call below. That's where we are currently logging errors to slack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to make another Gauge for errors related to the symbol and not a particular publisher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. One caveat is that we might still want to report an "ok" status for each publisher so we be sure that the lack of errors means everything is running smoothly vs. the observer service is down and isn't reporting errors when it should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. I think this line doesn't correctly report the OK status for each publisher though, as an error on publisher A will result in an error report for publisher B. You will have to check for each publisher whether there are any errors with their publisher_key in the list.

observer.py Outdated
# Write price error status
if price_errors:
for error in price_errors:
if error.publisher_key == publisher_key:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is doing extra work -- you could simply iterate over the errors once and log with error.publisher_key.

observer.py Outdated
for error in price_account_errors:
prom_publisher_status.labels(
symbol=symbol,
publisher=publisher_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about publisher_key here

observer.py Outdated
"crypto_price", "Price", labelnames=["symbol", "publisher", "status"]
)
prom_publisher_status = Gauge(
"publisher_status", "Publisher Status", labelnames=["symbol", "publisher", "error_code"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this "publisher_errors"? We already have a "status" per publisher, so this could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@thmzlt thmzlt force-pushed the publisher-status branch from 07ed2ce to f1ed2de Compare March 24, 2022 22:45
@thmzlt thmzlt closed this Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants