Skip to content

Conversation

@jveitchmichaelis
Copy link
Collaborator

@jveitchmichaelis jveitchmichaelis commented Jan 2, 2026

Description

  • Use torchmetrics to collect results and call evaluate_boxes during validation
  • All metrics now respect "validate_on_epoch", but should be fast enough even with large datasets to run with n=1. Metrics are always reset regardless.
  • Simpler empty frame tracking during validation_step
  • Much cleaner logging in main. Non-loggable metrics are dropped in the torchmetric class.
  • Use config_args to set up test fixtures instead of post-init overwriting the config. This is good practice and doing it the other way around causes undefined behaviour if certain aspects of the deepforest class aren't also updated in sync...
  • ... if the user changes anything like validation data etc after initialization, we need to also re-init the metrics because precision/recall needs to know about label_dict and the annotation file when it calls evaluate under the hood.

Related Issue(s)

#901 (step towards this)
#1254
#1245
Supports #1253

AI-Assisted Development

  • I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
  • I understand all the code I'm submitting
  • I have reviewed and validated all AI-generated code

AI tools used (if applicable):
Claude Code to do some tedious rewriting of the config_args.

@jveitchmichaelis jveitchmichaelis force-pushed the evaluation-torchmetric branch 3 times, most recently from 02bc30a to ce69a72 Compare January 2, 2026 02:50
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

❌ Patch coverage is 97.64706% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.52%. Comparing base (0ab23a3) to head (8a74338).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/main.py 93.10% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1256      +/-   ##
==========================================
- Coverage   87.73%   87.52%   -0.22%     
==========================================
  Files          20       21       +1     
  Lines        2716     2782      +66     
==========================================
+ Hits         2383     2435      +52     
- Misses        333      347      +14     
Flag Coverage Δ
unittests 87.52% <97.64%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jveitchmichaelis jveitchmichaelis marked this pull request as ready for review January 2, 2026 03:11
@jveitchmichaelis jveitchmichaelis force-pushed the evaluation-torchmetric branch 2 times, most recently from 3cefab1 to 4f6d7db Compare January 2, 2026 04:17
@jveitchmichaelis jveitchmichaelis force-pushed the evaluation-torchmetric branch 5 times, most recently from bf2ffef to 911173c Compare January 5, 2026 18:16
@henrykironde
Copy link
Contributor

@jveitchmichaelis can you please update this PR

@jveitchmichaelis jveitchmichaelis force-pushed the evaluation-torchmetric branch 2 times, most recently from 1942e78 to e048c3b Compare January 13, 2026 18:39
@jveitchmichaelis
Copy link
Collaborator Author

jveitchmichaelis commented Jan 13, 2026

I might revert some of the test changes here, to keep the PR a bit more focused. We can initialize the metrics in create_trainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants