-
Notifications
You must be signed in to change notification settings - Fork 11
feat(worker): add dynamic Sentry sampling for high-volume expected errors #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…rors Implement a before_send hook to sample (not suppress) known high-volume errors from specific repositories, reducing Sentry noise by 99.9% while maintaining visibility. Changes: - Add SAMPLED_ERROR_PATTERNS config dict for repo/error type sample rates - Add deterministic sampling using event ID hash (per code review feedback) - Add before_send hook to filter LockError, LockRetry, MaxRetriesExceededError from square-web at 0.1% sample rate - Add comprehensive tests for the sampling logic Closes CCMRG-2010
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #665 +/- ##
==========================================
- Coverage 92.98% 92.45% -0.53%
==========================================
Files 1294 1295 +1
Lines 47327 47640 +313
Branches 1592 1592
==========================================
+ Hits 44005 44047 +42
- Misses 3013 3284 +271
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Python's hash() uses randomization (PYTHONHASHSEED) which varies per process. Since Celery workers fork into multiple processes, the same event_id would produce different hashes in different workers. Switch to hashlib.md5 which provides stable hashing across all processes, ensuring truly deterministic sampling behavior.
If event_id is missing or empty, pass the event through instead of sampling it. This prevents the scenario where all events without event_id would hash to the same value and be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| # Extract error type from the exception | ||
| error_type: str | None = None | ||
| try: | ||
| error_type = event["exception"]["values"][0]["type"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception sampling checks wrong exception in chains
Medium Severity
The code extracts error_type from event["exception"]["values"][0]["type"], but Sentry's exception values array is ordered from oldest to newest (innermost to outermost). For chained exceptions, values[0] is the root cause exception, while values[-1] is the primary outermost exception that users see. If a LockError wraps another exception, the sampling would incorrectly check the inner exception type instead of LockError, causing those events to bypass sampling entirely.
Implement a
before_sendhook to sample (not suppress) known high-volume errors from specific repositories, reducing Sentry noise by 99.9% while maintaining visibility.Summary
SAMPLED_ERROR_PATTERNSconfig dict for repo/error type sample ratesbefore_sendhook to filterLockError,LockRetry,MaxRetriesExceededErrorfromsquare-webat 0.1% sample rateProblem
We have high-volume Sentry errors that are expected behavior for certain high-traffic repositories:
LockErrorLockRetryMaxRetriesExceededErrorThese errors pollute our Sentry error counts, make it harder to identify real issues, and consume Sentry quota unnecessarily.
Solution
Sample these errors at 0.1% (keeping ~4K/month instead of 4M) using a deterministic hash-based approach that ensures consistent sampling decisions.
Test plan
before_sendhook_should_sample_eventhelperCloses CCMRG-2010
Note
Implements deterministic Sentry error sampling to reduce noise while preserving visibility.
SAMPLED_ERROR_PATTERNSand_should_sample_event(MD5-based) to sample specific errors per repo (e.g.,LockError,LockRetry,MaxRetriesExceededErrorforsquare-webat 0.1%) inhelpers/sentry.pybefore_sendhook filters events based onexceptiontype andrepo_nametag (supports tags as dict or list); passes through when data is missingbefore_sendintoinitialize_sentryand keepbefore_send_transaction(now filteringUploadBreadcrumbby short and fully-qualified names)helpers/tests/unit/test_sentry.pyWritten by Cursor Bugbot for commit 5fcb7b1. This will update automatically on new commits. Configure here.