Skip to content

Conversation

@drazisil-codecov
Copy link
Contributor

@drazisil-codecov drazisil-codecov commented Jan 20, 2026

Implement a before_send hook to sample (not suppress) known high-volume errors from specific repositories, reducing Sentry noise by 99.9% while maintaining visibility.

Summary

  • Add SAMPLED_ERROR_PATTERNS config dict for repo/error type sample rates
  • Add deterministic sampling using event ID hash (per code review feedback)
  • Add before_send hook to filter LockError, LockRetry, MaxRetriesExceededError from square-web at 0.1% sample rate
  • Add comprehensive tests for the sampling logic

Problem

We have high-volume Sentry errors that are expected behavior for certain high-traffic repositories:

Error Type Primary Source Volume
LockError square-web ~4M/month
LockRetry square-web High
MaxRetriesExceededError square-web High

These errors pollute our Sentry error counts, make it harder to identify real issues, and consume Sentry quota unnecessarily.

Solution

Sample these errors at 0.1% (keeping ~4K/month instead of 4M) using a deterministic hash-based approach that ensures consistent sampling decisions.

Test plan

  • Unit tests for before_send hook
  • Unit tests for _should_sample_event helper
  • Tests for deterministic sampling behavior
  • Tests for edge cases (missing tags, non-matching repos/errors)

Closes CCMRG-2010


Note

Implements deterministic Sentry error sampling to reduce noise while preserving visibility.

  • Add SAMPLED_ERROR_PATTERNS and _should_sample_event (MD5-based) to sample specific errors per repo (e.g., LockError, LockRetry, MaxRetriesExceededError for square-web at 0.1%) in helpers/sentry.py
  • New before_send hook filters events based on exception type and repo_name tag (supports tags as dict or list); passes through when data is missing
  • Wire before_send into initialize_sentry and keep before_send_transaction (now filtering UploadBreadcrumb by short and fully-qualified names)
  • Add unit tests covering init wiring, transaction filtering, sampling behavior, determinism, and edge cases in helpers/tests/unit/test_sentry.py

Written by Cursor Bugbot for commit 5fcb7b1. This will update automatically on new commits. Configure here.

…rors

Implement a before_send hook to sample (not suppress) known high-volume
errors from specific repositories, reducing Sentry noise by 99.9% while
maintaining visibility.

Changes:
- Add SAMPLED_ERROR_PATTERNS config dict for repo/error type sample rates
- Add deterministic sampling using event ID hash (per code review feedback)
- Add before_send hook to filter LockError, LockRetry, MaxRetriesExceededError
  from square-web at 0.1% sample rate
- Add comprehensive tests for the sampling logic

Closes CCMRG-2010
@linear
Copy link

linear bot commented Jan 20, 2026

@sentry
Copy link

sentry bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.45%. Comparing base (6ce1874) to head (5fcb7b1).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
apps/worker/helpers/sentry.py 97.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #665      +/-   ##
==========================================
- Coverage   92.98%   92.45%   -0.53%     
==========================================
  Files        1294     1295       +1     
  Lines       47327    47640     +313     
  Branches     1592     1592              
==========================================
+ Hits        44005    44047      +42     
- Misses       3013     3284     +271     
  Partials      309      309              
Flag Coverage Δ
workerintegration 58.01% <11.62%> (-1.13%) ⬇️
workerunit 89.74% <97.67%> (-1.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link

codecov-notifications bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
apps/worker/helpers/sentry.py 97.67% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

cursor[bot]

This comment was marked as outdated.

Python's hash() uses randomization (PYTHONHASHSEED) which varies per
process. Since Celery workers fork into multiple processes, the same
event_id would produce different hashes in different workers.

Switch to hashlib.md5 which provides stable hashing across all processes,
ensuring truly deterministic sampling behavior.
cursor[bot]

This comment was marked as outdated.

If event_id is missing or empty, pass the event through instead of
sampling it. This prevents the scenario where all events without
event_id would hash to the same value and be dropped.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

# Extract error type from the exception
error_type: str | None = None
try:
error_type = event["exception"]["values"][0]["type"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception sampling checks wrong exception in chains

Medium Severity

The code extracts error_type from event["exception"]["values"][0]["type"], but Sentry's exception values array is ordered from oldest to newest (innermost to outermost). For chained exceptions, values[0] is the root cause exception, while values[-1] is the primary outermost exception that users see. If a LockError wraps another exception, the sampling would incorrectly check the inner exception type instead of LockError, causing those events to bypass sampling entirely.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants