-
Notifications
You must be signed in to change notification settings - Fork 7
Expose publisher status gauge #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
observer.py
Outdated
| if not price_errors and not price_account_errors: | ||
| prom_publisher_status.labels( | ||
| symbol=symbol, | ||
| publisher=publisher_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the way the errors work is that they have a publisher_key field that can be None. If the key is None, it's an error for the symbol as a whole, not a particular publisher. I think setting publisher=publisher_key here is not the right thing to do.
I also suggest moving all of the error logging code outside the for loop, so it's adjacent to the notify call below. That's where we are currently logging errors to slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to make another Gauge for errors related to the symbol and not a particular publisher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. One caveat is that we might still want to report an "ok" status for each publisher so we be sure that the lack of errors means everything is running smoothly vs. the observer service is down and isn't reporting errors when it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. I think this line doesn't correctly report the OK status for each publisher though, as an error on publisher A will result in an error report for publisher B. You will have to check for each publisher whether there are any errors with their publisher_key in the list.
observer.py
Outdated
| # Write price error status | ||
| if price_errors: | ||
| for error in price_errors: | ||
| if error.publisher_key == publisher_key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is doing extra work -- you could simply iterate over the errors once and log with error.publisher_key.
observer.py
Outdated
| for error in price_account_errors: | ||
| prom_publisher_status.labels( | ||
| symbol=symbol, | ||
| publisher=publisher_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about publisher_key here
observer.py
Outdated
| "crypto_price", "Price", labelnames=["symbol", "publisher", "status"] | ||
| ) | ||
| prom_publisher_status = Gauge( | ||
| "publisher_status", "Publisher Status", labelnames=["symbol", "publisher", "error_code"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this "publisher_errors"? We already have a "status" per publisher, so this could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
07ed2ce to
f1ed2de
Compare
This adds a new Prometheus gauge to keep track of the error code for each price account and publisher. We use the empty string error code to communicate that the publisher/symbol pair is in a healthy status. There is a separate time series for each error code so we can have error-specific alerts. We can combine multiple series using queries.