Skip to content

Conversation

@falletta
Copy link
Contributor

@falletta falletta commented Jan 30, 2026

Summary

Changes:

  • Optimized the memory scaler and split operations, providing a substantial speedup for ts.static (up to 48% for batches larger than 5000).
  • Added scaling scripts for ts.static, ts.relax, and ts.integrate (NVE, NVT) to analyze scaling performance.
  • Added tests for memory scaler values for non periodic systems.

The figure below shows the speedup achieved for static evaluations, 10-step atomic relaxation, 10-step NVE MD, and 10-step NVT MD. Prior results are shown in blue, while new results are shown in red. The speedup is calculated as
speedup (%) = (baseline_time / current_time − 1) × 100. We observe that:

  • ts.static achieves a 43.9% speedup for 100,000 structures
  • ts.relax achieves a 2.8% speedup for 1,500 structures
  • ts.integrate (NVE) achieves a 0.9% speedup for 10,000 structures
  • ts.integrate (NVT) achieves a 1.4% speedup for 10,000 structures
Screenshot 2026-01-30 at 11 23 26 AM

Comments:

From the scaling plots, we can see that the timings of ts.static and ts.integrate are all consistent with each other. Indeed:

  • ts.static → 85s for 100'000 evaluations
  • ts.integrate NVE → 87s for 10'000 structures (10 MD steps each) → 87s for 100'000 evaluations
  • ts.integrate NVT → 89s for 10'000 structures (10 MD steps each) → 89s for 100'000 evaluations

However, when looking at the relaxation:

  • ts.relax → 63s for 1'000 structures (10 relax steps each) → 63s for 10'000 evaluations → ~630s for 100'000 evaluations

So ts.relax is about 7x slower than ts.static or ts.integrate. The unbatched FrechetCellFilter clearly contributes to that. I'm wondering if there are additional bottlenecks in the code that we might optimize to reduce that massive 7x cost.

@falletta falletta marked this pull request as draft January 30, 2026 01:22
bbox[i] += 2.0
volume = bbox.prod() / 1000 # convert A^3 to nm^3
number_density = state.n_atoms / volume.item()
# Use cell volume (O(1)); SimState always has a cell. Avoids O(N) position scan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-periodic systems don't have a sensible cell, see #412

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now minimized the differences compared to the initial code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I added explicit tests for the memory scaler values and verified that the changes in this PR do not affect the test’s success

Comment on lines +589 to +598
self.memory_scalers = calculate_batched_memory_scalers(
states, self.memory_scales_with
)
self.state_slices = states.split()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batching makes sense here

Comment on lines +628 to +635
if isinstance(states, SimState):
self.batched_states = [[states[index_bin]] for index_bin in self.index_bins]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.split() is identical to this and faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing self.state_slices instead of calling states.split() again makes the code 5% faster, so I'd keep it

@falletta falletta marked this pull request as ready for review January 30, 2026 15:42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file isn't needed

)
self.state_slices = states.split()
else:
self.state_slices = states
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not concat and then called the batched logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants