Skip to content

Conversation

@ChiaraLionello
Copy link
Collaborator

I have created the function the works for 2D and 3D labels' arrays. It cleans the clusters with low population.
Solved #136

@SimoneMartino98 SimoneMartino98 requested review from SimoneMartino98 and matteobecchi and removed request for matteobecchi January 22, 2026 16:29
NumPy array containing the label values.
The array should have dimensions corresponding
to either (n_atoms, n_frames) for 2D inputs,
or (n_atoms, n_frames, n_dims) for 3D inputs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the labels, not the data... aren't they always a 2D array? Each atom at each frame has one integer label, even if the data were multivariate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reason, I think all the logic in the function for the 3D case is never necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends how we do want to use this function. I've thought different applications:

  • 2D labels: cleaning the labels to color the trajectory
  • 2D labels: can come also from other clustering methods, not necessarily from onion, and can be used as above
  • 3D labels: come from get_onion_analysis and can be use to redo the plot with the number of clusters and the unclassified fraction cleaned. By keeping this function out from get_onion_analysis you are free to change the threshold how many times you want without running the Onion calculation multiple times. In this case, I should also think a way to do that plot again. On the other hand, it's true that's enough to keep only the 2D option and then run it in a for cicle when we need it.

There's no problem for me to delete it in case; since it wasn't difficult to do I've added both the options.

array is 2D (n_atoms, n_frames), the output will be a 2D array of
the same shape. Otherwise, if the input is 3D
(n_atoms, n_frames, n_dims), the output will also be a 3D array
of the same shape.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, labels are always 2D.

Copy link
Collaborator

@SimoneMartino98 SimoneMartino98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two easy comments. I’ll leave the discussion on the label dimensions to @matteobecchi.

Anyway, nice work!

missing = np.setdiff1d(excluded_arr, np.unique(labels))

if missing.size > 0:
logger.log(f"Excluded value(s) not found in labels: {missing}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually a log, rather it should be a warning. It is also true that at the moment we don't have a warning log function. My suggestion here is to change the line with:

logger.warning(f"Excluded value(s) not found in labels: {missing}")

and then add something like the following function in the _internal.logs.py file (after the log function):

  def warning(self, msg: str) -> None:
      """Records an informational warning message to the log.

      Parameters:
          msg:
              The message to record.
      """
      timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
      history_entry = f"[{timestamp}] {msg}"
      console.warning(msg)
      self._log.append(history_entry)

It should work and give a yellow warning msg. If not let's see what happen.


from .case_data import CleanPopCaseData

# ---------------- Tests ----------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this; we are already in the test folder. Up to you though.

"Clean_pop test files were not present. They have been created."
)
exp_clean_pop = np.load(expected_clean_pop)
assert np.allclose(exp_clean_pop, test_clean_pop, atol=1e-6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure; but i think that for this case the arrays should be exactly equal to each other (we are not using float here, they are all int, i'm not seeing any machine dependencies).

Maybe is the same thing, but i think that here we can be more restrictive with:

https://numpy.org/devdocs/reference/generated/numpy.array_equal.html

instead of np.allclose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants