REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha `proporcionalidades` #292

MarioCarvalhoBr · 2025-12-05T21:29:55Z

This pull request introduces improvements to numeric value handling and error reporting, as well as updates to versioning and validation logic. The most significant changes are the refactoring of decimal place checks, enhancements to error messaging for data structure validation, and updates to the proportionality model’s processing flow.

Numeric value validation and formatting:

Refactored decimal place validation by replacing the old has_excessive_decimals function with check_two_decimals_places and check_n_decimals_places in number_formatting.py. This improves precision and flexibility in decimal checks across the codebase.
Updated all usages of decimal validation in legend_processing.py and value_data_validation.py to use the new functions, ensuring consistent handling of numeric precision in validations. [1] [2] [3] [4] [5]

Proportionality model structure and error reporting:

Enhanced the sp_proportionality.py model to validate column naming patterns for both level 1 and level 2 columns, adding specific error messages when columns do not match expected patterns (e.g., CÓDIGO-ANO or CÓDIGO-ANO-CENÁRIO). If structural errors are found, the DataFrame is cleared to prevent further processing.
Improved logic in expected_structure_columns to use the correct column lists and consistently check for missing or extra columns, leading to more accurate error reporting in HTML reports. [1] [2]

Validation report and versioning updates:

Updated HTML error reports to provide more specific error messages about unexpected column patterns and missing required columns, and adjusted the error counts to reflect these changes. [1] [2] [3] [4] [5]
Bumped the version and serial numbers in README.md and metadata_info.py to reflect the new release. [1] [2]

Configuration and utility improvements:

Added a new configuration constant PRECISION_DECIMAL_PLACE_TRUNCATE to control decimal truncation precision globally.
Added utility function to_decimal_truncated for precise decimal truncation without rounding.

Other minor improvements:

Added import of categorize_strings_by_id_pattern_from_list for more robust column categorization in the proportionality model.
Improved conditional logic in the proportionality model’s run method to ensure processing only occurs when the data is valid and headers are correctly structured.

…nality.py to categorize and report structural errors in DataFrame columns

… validation and status_dev increment

…alidator.py for decimal places and sum checks

MarioCarvalhoBr · 2025-12-05T21:33:17Z

@pedro-andrade-inpe Nesse PR, implemteamos uma otimização de código interessante:

A função mais custosa do código era uma função chamada _check_sum_equals_one, que basicamente fazia uma série de somas e usava o iterrows() do Pandas. Isso na planilha de proporcionalidades, que era uma das maiores planilhas entre os arquivos. Acredito que evoluímos muito bem.

Implementei funções vetorizadas e, sobre a issue #290, tivemos melhorias significativas, passando de 721 segundos para 18 segundos em uma execução completa do canoa-data-validate com um dado que o @assismauro enviou.

Um speedup de 40x.

Copilot

Pull request overview

This pull request refactors numeric validation handling by replacing the truncate_number function with new, more flexible decimal validation functions (to_decimal_truncated, check_n_decimals_places, check_two_decimals_places). The main changes optimize the proportionality validator's performance using pandas vectorized operations, improve error reporting for column structure validation in the proportionality model, and add a new configuration constant for decimal precision control.

Key Changes:

Replaced truncate_number with to_decimal_truncated for more precise decimal handling
Optimized _check_sum_equals_one method using pandas vectorized operations instead of row-by-row iteration
Enhanced proportionality model to validate column naming patterns and provide clearer error messages
Added PRECISION_DECIMAL_PLACE_TRUNCATE configuration constant

Reviewed changes

Copilot reviewed 12 out of 21 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`data_validate/helpers/common/formatting/number_formatting.py`	Added three new decimal validation functions; removed old `truncate_number` function
`tests/unit/helpers/common/formatting/test_number_formatting.py`	Removed tests for `truncate_number` but did not add tests for new functions
`data_validate/validators/spreadsheets/proportionality/proportionality_validator.py`	Refactored validation logic to use vectorized operations; improved performance and error reporting
`data_validate/models/sp_proportionality.py`	Enhanced column structure validation with pattern matching and improved error messages
`data_validate/helpers/common/validation/legend_processing.py`	Updated to use `check_two_decimals_places` instead of `has_excessive_decimals`
`data_validate/helpers/common/validation/value_data_validation.py`	Updated to use `check_two_decimals_places` instead of `has_excessive_decimals`
`data_validate/config/config.py`	Added `PRECISION_DECIMAL_PLACE_TRUNCATE` constant
`data_validate/helpers/base/metadata_info.py`	Updated serial number from 667 to 671
`README.md`	Updated version badge
Output files (HTML reports, PDFs, badges)	Reflect updated error counts and test results

Comments suppressed due to low confidence (2)

data_validate/helpers/common/formatting/number_formatting.py:20

Except block directly handles BaseException.

    except:

data_validate/validators/spreadsheets/proportionality/proportionality_validator.py:230

Except block directly handles BaseException.

                            except:

Copilot · 2025-12-05T21:34:23Z

data_validate/helpers/common/formatting/number_formatting.py

+def check_n_decimals_places(value_number, value_to_ignore, number_decimal_places):
+    if pd.isna(value_number) or value_number == value_to_ignore:
        return False
+    decimal_value = Decimal(str(value_number).replace(",", "."))
+    return decimal_value.as_tuple().exponent < -number_decimal_places


The check_n_decimals_places function lacks parameter type hints and a docstring. This function is used in critical validation logic and should have clear documentation about its purpose, parameters (especially what value_to_ignore represents), and return value.

Copilot · 2025-12-05T21:34:23Z

data_validate/validators/spreadsheets/proportionality/proportionality_validator.py

+                                    errors.append(
+                                        f"{self.sp_name_proportionality}: A soma de fatores influenciadores para o ID '{row_id}' no pai '{col}' é 0 (zero). Na planilha {self.sp_name_value}, existe(m) valor(es) para os filhos do indicador '{col}', no mesmo ID, que não é (são) zero ou DI (Dado Indisponível)."
+                                    )
+                            except:


The bare except: clause should be replaced with specific exception handling. This catches all exceptions including SystemExit and KeyboardInterrupt. Consider catching specific exceptions like (ValueError, TypeError) instead to handle only the expected conversion errors.

Suggested change

except:

except (ValueError, TypeError):

Copilot · 2025-12-05T21:34:23Z

tests/unit/helpers/common/formatting/test_number_formatting.py

-        # Test large numbers
-        result = truncate_number(123.456789, 1)
-        assert result == 123.4
+        assert msg == "O valor '3.0' é menor que o valor mínimo permitido (5)."


The removal of the test_truncate_number test method reduces test coverage for the module. While the truncate_number function has been replaced with to_decimal_truncated, the new function is not being tested in this file. According to the project guidelines, 100% coverage is mandatory for any new test file created, and changes that reduce coverage should include corresponding new tests for replacement functions.

Copilot · 2025-12-05T21:34:24Z

data_validate/models/sp_proportionality.py

+
+            __, level_2_codes_not_matched_by_pattern = categorize_strings_by_id_pattern_from_list(unique_columns_level_2, self.scenarios_list)
+
+            if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:


The logic at line 63 has a redundant condition. The check and not level_1_codes_not_matched_by_pattern is unnecessary because this block is already inside the else clause of if level_1_codes_not_matched_by_pattern: (line 53), which guarantees that level_1_codes_not_matched_by_pattern is empty or falsy. This condition can be simplified to just if level_2_codes_not_matched_by_pattern:.

Suggested change

if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:

if level_2_codes_not_matched_by_pattern:

Copilot · 2025-12-05T21:34:24Z

data_validate/helpers/common/formatting/number_formatting.py

+    except:
+        return Decimal("0")


The to_decimal_truncated function uses a bare except: clause which is considered a bad practice. This catches all exceptions including SystemExit and KeyboardInterrupt, which should not be suppressed. Consider catching specific exceptions like (ValueError, InvalidOperation, TypeError) instead.

Copilot · 2025-12-05T21:34:24Z

data_validate/helpers/common/formatting/number_formatting.py

+def to_decimal_truncated(value_number, value_to_ignore, precision):
+    if pd.isna(value_number) or value_number == value_to_ignore:
+        return Decimal("0")

-def has_excessive_decimals(value) -> bool:
-    if value in [float("-inf"), float("inf")] or pd.isna(value):
+    s_val = str(value_number).replace(",", ".")
+    try:
+        if "." in s_val:
+            integer_part, decimal_part = s_val.split(".")
+            truncated_val = f"{integer_part}.{decimal_part[:precision]}"
+        else:
+            truncated_val = s_val
+        return Decimal(truncated_val)
+    except:
+        return Decimal("0")


The to_decimal_truncated function lacks parameter type hints and a docstring. According to the project's coding guidelines, all functions should have proper documentation explaining their purpose, parameters, and return values. This is especially important for utility functions that are used across multiple modules.

Copilot · 2025-12-05T21:34:24Z

data_validate/validators/spreadsheets/proportionality/proportionality_validator.py

@@ -1,5 +1,5 @@
 #  Copyright (c) 2025 Mário Carvalho (https://github.com/MarioCarvalhoBr).
-from decimal import Decimal
+from decimal import Decimal, ROUND_DOWN


Import of 'ROUND_DOWN' is not used.

Suggested change

from decimal import Decimal, ROUND_DOWN

from decimal import Decimal

Copilot · 2025-12-05T21:34:25Z

data_validate/validators/spreadsheets/proportionality/proportionality_validator.py

+                                    errors.append(
+                                        f"{self.sp_name_proportionality}: A soma de fatores influenciadores para o ID '{row_id}' no pai '{col}' é 0 (zero). Na planilha {self.sp_name_value}, existe(m) valor(es) para os filhos do indicador '{col}', no mesmo ID, que não é (são) zero ou DI (Dado Indisponível)."
+                                    )
+                            except:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except:

except (ValueError, TypeError):

# Ignore values that cannot be converted to float (e.g., malformed or non-numeric data)

MarioCarvalhoBr added 4 commits December 4, 2025 18:39

feat: close AdaptaBrasil#290: enhance data validation in sp_proportio…

aece5d8

…nality.py to categorize and report structural errors in DataFrame columns

feat: update metadata_info and sp_proportionality for improved column…

4851e6a

… validation and status_dev increment

feat: close AdaptaBrasil#291: enhance validation in proportionality_v…

6760488

…alidator.py for decimal places and sum checks

chore: reset status_dev to 0 and remove unused truncate_number function

4eea538

Copilot AI review requested due to automatic review settings December 5, 2025 21:29

Copilot started reviewing on behalf of MarioCarvalhoBr December 5, 2025 21:30 View session

Copilot AI reviewed Dec 5, 2025

View reviewed changes

pedro-andrade-inpe merged commit d37b91b into AdaptaBrasil:main Dec 5, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha `proporcionalidades` #292

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha `proporcionalidades` #292

Uh oh!

MarioCarvalhoBr commented Dec 5, 2025

Uh oh!

MarioCarvalhoBr commented Dec 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		__, level_2_codes_not_matched_by_pattern = categorize_strings_by_id_pattern_from_list(unique_columns_level_2, self.scenarios_list)

		if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:

	if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:
	if level_2_codes_not_matched_by_pattern:

	from decimal import Decimal, ROUND_DOWN
	from decimal import Decimal

	except:
	except (ValueError, TypeError):
	# Ignore values that cannot be converted to float (e.g., malformed or non-numeric data)

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha proporcionalidades #292

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha proporcionalidades #292

Uh oh!

Conversation

MarioCarvalhoBr commented Dec 5, 2025

Uh oh!

MarioCarvalhoBr commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha `proporcionalidades` #292

REFACTOR: Otimização de função de verificação de fatores influenciadores e melhoria na lógica de tratar a planilha `proporcionalidades` #292

MarioCarvalhoBr commented Dec 5, 2025 •

edited

Loading