Fix file upload #182

azertyfun · 2026-01-23T15:27:32Z

Hi!

We have been using core-dump-handler for a little while but frequently encountered the following behavior on OpenShift:

[2025-08-05T07:07:10Z INFO  core_dump_agent] Setting s3 endpoint location to: <REDACTED>
[2025-08-05T07:07:10Z INFO  core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/<REDACTED>.zip
[2025-08-05T07:07:10Z INFO  core_dump_agent] zip size is 129879392
[2025-08-05T07:07:19Z ERROR core_dump_agent] Upload Failed hyper: channel closed

Restarting the pod would attempt the upload again, but it would not succeed either. We've had to resort to fetching the zip file using kubectl which is quite painful operationally.

For me the issue is twofold:

The upload fails
Failed uploads are not retried until the pod is (for whichever reason) re-created. This forces us to pro-actively monitor the logs for failed upload attempts.

For the first problem, simply updating rust-s3 and its dependencies worked a treat:

Retrying reqwest: error sending request for url (http://<REDACTED>/core-dumps-storage-bucket-<SNIP>.zip?partNumber=3&uploadId=<REDACTED>)

The lack of retry isn't nearly as painful, but in order to offer a harder guarantee that we won't have missing uploads (e.g. because of an issue with the S3 bucket itself), I have added a retry mechanism with exponential backoff.
On that front the application behavior is a bit spaghetti, and I am not 100 % sure that use_inotify == "true" is the right condition to check to enable the retry behavior. However in my setup (k8s using inotify) this works a treat.

This is actually a rebase of a broader change we are implemented internally, which also includes Prometheus metrics for core dumps to allow us to write some alerts (which is why I didn't raise an issue beforehand, the work had to be done anyway). If this PR gets merged I will create a follow-up for that.

Noticed the following error would happen a lot on OpenShift: > [2025-08-05T07:07:10Z INFO core_dump_agent] Setting s3 endpoint location to: <REDACTED> > [2025-08-05T07:07:10Z INFO core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/<REDACTED>.zip > [2025-08-05T07:07:10Z INFO core_dump_agent] zip size is 129879392 > [2025-08-05T07:07:19Z ERROR core_dump_agent] Upload Failed hyper: channel closed Restarting the pod or retrying the upload would not help. After upgrading, the uploads finally worked again: > Retrying reqwest: error sending request for url (http://<REDACTED>/core-dumps-storage-bucket-<SNIP>.zip?partNumber=3&uploadId=<REDACTED>) Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>

Prevents zip files from being lost if the upload failed for whichever reason. Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>

Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>

azertyfun force-pushed the fix-file-upload branch 2 times, most recently from b6c0d77 to 350152f Compare January 23, 2026 15:41

Nathan Monfils added 2 commits January 23, 2026 16:42

Added retry mechanism for file uploads

ed2e97e

Prevents zip files from being lost if the upload failed for whichever reason. Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>

Fix deprecated explicit file lock mode

40a7c34

Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>

azertyfun force-pushed the fix-file-upload branch from 350152f to 40a7c34 Compare January 23, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix file upload #182

Fix file upload #182

Uh oh!

azertyfun commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix file upload #182

Are you sure you want to change the base?

Fix file upload #182

Uh oh!

Conversation

azertyfun commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant