add torchmetrics wrapper for evaluate_boxes #1256

jveitchmichaelis · 2026-01-02T02:10:28Z

Description

Use torchmetrics to collect results and call evaluate_boxes during validation
All metrics now respect "validate_on_epoch", but should be fast enough even with large datasets to run with n=1. Metrics are always reset regardless.
Simpler empty frame tracking during validation_step
Much cleaner logging in main. Non-loggable metrics are dropped in the torchmetric class.
Use config_args to set up test fixtures instead of post-init overwriting the config. This is good practice and doing it the other way around causes undefined behaviour if certain aspects of the deepforest class aren't also updated in sync...
... if the user changes anything like validation data etc after initialization, we need to also re-init the metrics because precision/recall needs to know about label_dict and the annotation file when it calls evaluate under the hood.

Related Issue(s)

#901 (step towards this)
#1254
#1245
Supports #1253

AI-Assisted Development

I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
I understand all the code I'm submitting
I have reviewed and validated all AI-generated code

AI tools used (if applicable):
Claude Code to do some tedious rewriting of the config_args.

codecov · 2026-01-02T03:08:49Z

Codecov Report

❌ Patch coverage is 97.64706% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.52%. Comparing base (0ab23a3) to head (8a74338).
⚠️ Report is 9 commits behind head on main.

Files with missing lines	Patch %	Lines
src/deepforest/main.py	93.10%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1256      +/-   ##
==========================================
- Coverage   87.73%   87.52%   -0.22%     
==========================================
  Files          20       21       +1     
  Lines        2716     2782      +66     
==========================================
+ Hits         2383     2435      +52     
- Misses        333      347      +14

Flag	Coverage Δ
unittests	`87.52% <97.64%> (-0.22%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

henrykironde · 2026-01-13T17:58:40Z

@jveitchmichaelis can you please update this PR

jveitchmichaelis · 2026-01-13T18:47:46Z

I might revert some of the test changes here, to keep the PR a bit more focused. We can initialize the metrics in create_trainer.

jveitchmichaelis force-pushed the evaluation-torchmetric branch 3 times, most recently from 02bc30a to ce69a72 Compare January 2, 2026 02:50

jveitchmichaelis marked this pull request as ready for review January 2, 2026 03:11

jveitchmichaelis force-pushed the evaluation-torchmetric branch 2 times, most recently from 3cefab1 to 4f6d7db Compare January 2, 2026 04:17

jveitchmichaelis requested review from bw4sz and henrykironde January 3, 2026 00:28

jveitchmichaelis force-pushed the evaluation-torchmetric branch 5 times, most recently from bf2ffef to 911173c Compare January 5, 2026 18:16

jveitchmichaelis force-pushed the evaluation-torchmetric branch 2 times, most recently from 1942e78 to e048c3b Compare January 13, 2026 18:39

use torchmetrics for evaluation

8a74338

jveitchmichaelis force-pushed the evaluation-torchmetric branch from e048c3b to 8a74338 Compare January 13, 2026 19:08

jveitchmichaelis mentioned this pull request Jan 13, 2026

Simplify evaluation and prediction to mirror training datasets and create tests to assert expected eval behavior #1253

Merged

jveitchmichaelis mentioned this pull request Jan 30, 2026

wip: gather validation predictions from all ranks in DDP for multi-gpu inference. #1290

Open

jveitchmichaelis added 5 commits January 30, 2026 14:30

filter image_paths

3352eb5

don't sync preds between devices

e91dece

check drop .cpu()

8138b6d

no need to filter if we're syncing before compute

b08ba99

sync strings

00f0aa8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add torchmetrics wrapper for evaluate_boxes #1256

add torchmetrics wrapper for evaluate_boxes #1256

Uh oh!

jveitchmichaelis commented Jan 2, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 2, 2026 •

edited

Loading

Uh oh!

henrykironde commented Jan 13, 2026

Uh oh!

jveitchmichaelis commented Jan 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add torchmetrics wrapper for evaluate_boxes #1256

Are you sure you want to change the base?

add torchmetrics wrapper for evaluate_boxes #1256

Uh oh!

Conversation

jveitchmichaelis commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

AI-Assisted Development

Uh oh!

codecov bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

henrykironde commented Jan 13, 2026

Uh oh!

jveitchmichaelis commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jveitchmichaelis commented Jan 2, 2026 •

edited

Loading

codecov bot commented Jan 2, 2026 •

edited

Loading

jveitchmichaelis commented Jan 13, 2026 •

edited

Loading