Addition of the function cleaning_cluster_population #149

ChiaraLionello · 2026-01-22T16:28:51Z

I have created the function the works for 2D and 3D labels' arrays. It cleans the clusters with low population.
Solved #136

matteobecchi · 2026-01-23T22:12:21Z

src/dynsight/_internal/data_processing/clusters.py

+            NumPy array containing the label values.
+            The array should have dimensions corresponding
+            to either (n_atoms, n_frames) for 2D inputs,
+            or (n_atoms, n_frames, n_dims) for 3D inputs.


These are the labels, not the data... aren't they always a 2D array? Each atom at each frame has one integer label, even if the data were multivariate.

For the same reason, I think all the logic in the function for the 3D case is never necessary.

I think it depends how we do want to use this function. I've thought different applications:

2D labels: cleaning the labels to color the trajectory

2D labels: can come also from other clustering methods, not necessarily from onion, and can be used as above

3D labels: come from get_onion_analysis and can be use to redo the plot with the number of clusters and the unclassified fraction cleaned. By keeping this function out from get_onion_analysis you are free to change the threshold how many times you want without running the Onion calculation multiple times. In this case, I should also think a way to do that plot again. On the other hand, it's true that's enough to keep only the 2D option and then run it in a for cicle when we need it.

There's no problem for me to delete it in case; since it wasn't difficult to do I've added both the options.

matteobecchi · 2026-01-23T22:12:45Z

src/dynsight/_internal/data_processing/clusters.py

+        array is 2D (n_atoms, n_frames), the output will be a 2D array of
+        the same shape. Otherwise, if the input is 3D
+        (n_atoms, n_frames, n_dims), the output will also be a 3D array
+        of the same shape.


Same as before, labels are always 2D.

SimoneMartino98

Just two easy comments. I’ll leave the discussion on the label dimensions to @matteobecchi.

Anyway, nice work!

SimoneMartino98 · 2026-01-26T15:47:15Z

src/dynsight/_internal/data_processing/clusters.py

+    missing = np.setdiff1d(excluded_arr, np.unique(labels))
+
+    if missing.size > 0:
+        logger.log(f"Excluded value(s) not found in labels: {missing}")


This is not actually a log, rather it should be a warning. It is also true that at the moment we don't have a warning log function. My suggestion here is to change the line with:

logger.warning(f"Excluded value(s) not found in labels: {missing}")

and then add something like the following function in the _internal.logs.py file (after the log function):

def warning(self, msg: str) -> None: """Records an informational warning message to the log. Parameters: msg: The message to record. """ timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S") history_entry = f"[{timestamp}] {msg}" console.warning(msg) self._log.append(history_entry)

It should work and give a yellow warning msg. If not let's see what happen.

SimoneMartino98 · 2026-01-26T15:52:18Z

tests/data_processing/cluster/test_cluster.py

+
+from .case_data import CleanPopCaseData
+
+# ---------------- Tests ----------------


I would remove this; we are already in the test folder. Up to you though.

SimoneMartino98 · 2026-01-26T15:59:18Z

tests/data_processing/cluster/test_cluster.py

+            "Clean_pop test files were not present. They have been created."
+        )
+    exp_clean_pop = np.load(expected_clean_pop)
+    assert np.allclose(exp_clean_pop, test_clean_pop, atol=1e-6)


i'm not sure; but i think that for this case the arrays should be exactly equal to each other (we are not using float here, they are all int, i'm not seeing any machine dependencies).

Maybe is the same thing, but i think that here we can be more restrictive with:

https://numpy.org/devdocs/reference/generated/numpy.array_equal.html

instead of np.allclose

ChiaraLionello added 10 commits January 13, 2026 16:27

Addition of a function that cleans cluster population for 2D labels.

d2f4369

cleaning_cluster_population now cleans also 3D labels.

6cf5161

Documentation added to cleaning_cluster_population.

ff137be

removed default for assigned_env.

2314e3f

better explained function doc.

5e67c2c

Now get_onion_analysis returns labels.

86646a3

Added excluded_env.

dcc2fea

get_onion_analysis return list_of_labels.

f78d066

Added a warning for excluded_env.

8d78027

Added test for cleaning_cluster_population.

f31af59

SimoneMartino98 requested review from SimoneMartino98 and matteobecchi and removed request for matteobecchi January 22, 2026 16:29

mypy fixes.

124167c

matteobecchi reviewed Jan 23, 2026

View reviewed changes

SimoneMartino98 requested changes Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of the function cleaning_cluster_population #149

Addition of the function cleaning_cluster_population #149

Uh oh!

ChiaraLionello commented Jan 22, 2026

Uh oh!

matteobecchi Jan 23, 2026

Uh oh!

matteobecchi Jan 23, 2026

Uh oh!

ChiaraLionello Jan 26, 2026

Uh oh!

matteobecchi Jan 23, 2026

Uh oh!

SimoneMartino98 left a comment

Uh oh!

SimoneMartino98 Jan 26, 2026

Uh oh!

SimoneMartino98 Jan 26, 2026

Uh oh!

SimoneMartino98 Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from .case_data import CleanPopCaseData

		# ---------------- Tests ----------------

Addition of the function cleaning_cluster_population #149

Are you sure you want to change the base?

Addition of the function cleaning_cluster_population #149

Uh oh!

Conversation

ChiaraLionello commented Jan 22, 2026

Uh oh!

matteobecchi Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

matteobecchi Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ChiaraLionello Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

matteobecchi Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

SimoneMartino98 left a comment

Choose a reason for hiding this comment

Uh oh!

SimoneMartino98 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

SimoneMartino98 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

SimoneMartino98 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants