Skip to content

Conversation

@MarioCarvalhoBr
Copy link
Member

This pull request introduces improvements to numeric value handling and error reporting, as well as updates to versioning and validation logic. The most significant changes are the refactoring of decimal place checks, enhancements to error messaging for data structure validation, and updates to the proportionality model’s processing flow.

Numeric value validation and formatting:

  • Refactored decimal place validation by replacing the old has_excessive_decimals function with check_two_decimals_places and check_n_decimals_places in number_formatting.py. This improves precision and flexibility in decimal checks across the codebase.
  • Updated all usages of decimal validation in legend_processing.py and value_data_validation.py to use the new functions, ensuring consistent handling of numeric precision in validations. [1] [2] [3] [4] [5]

Proportionality model structure and error reporting:

  • Enhanced the sp_proportionality.py model to validate column naming patterns for both level 1 and level 2 columns, adding specific error messages when columns do not match expected patterns (e.g., CÓDIGO-ANO or CÓDIGO-ANO-CENÁRIO). If structural errors are found, the DataFrame is cleared to prevent further processing.
  • Improved logic in expected_structure_columns to use the correct column lists and consistently check for missing or extra columns, leading to more accurate error reporting in HTML reports. [1] [2]

Validation report and versioning updates:

  • Updated HTML error reports to provide more specific error messages about unexpected column patterns and missing required columns, and adjusted the error counts to reflect these changes. [1] [2] [3] [4] [5]
  • Bumped the version and serial numbers in README.md and metadata_info.py to reflect the new release. [1] [2]

Configuration and utility improvements:

  • Added a new configuration constant PRECISION_DECIMAL_PLACE_TRUNCATE to control decimal truncation precision globally.
  • Added utility function to_decimal_truncated for precise decimal truncation without rounding.

Other minor improvements:

  • Added import of categorize_strings_by_id_pattern_from_list for more robust column categorization in the proportionality model.
  • Improved conditional logic in the proportionality model’s run method to ensure processing only occurs when the data is valid and headers are correctly structured.

Copilot AI review requested due to automatic review settings December 5, 2025 21:29
@MarioCarvalhoBr
Copy link
Member Author

MarioCarvalhoBr commented Dec 5, 2025

@pedro-andrade-inpe Nesse PR, implemteamos uma otimização de código interessante:

A função mais custosa do código era uma função chamada _check_sum_equals_one, que basicamente fazia uma série de somas e usava o iterrows() do Pandas. Isso na planilha de proporcionalidades, que era uma das maiores planilhas entre os arquivos. Acredito que evoluímos muito bem.

Implementei funções vetorizadas e, sobre a issue #290, tivemos melhorias significativas, passando de 721 segundos para 18 segundos em uma execução completa do canoa-data-validate com um dado que o @assismauro enviou.

Um speedup de 40x.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors numeric validation handling by replacing the truncate_number function with new, more flexible decimal validation functions (to_decimal_truncated, check_n_decimals_places, check_two_decimals_places). The main changes optimize the proportionality validator's performance using pandas vectorized operations, improve error reporting for column structure validation in the proportionality model, and add a new configuration constant for decimal precision control.

Key Changes:

  • Replaced truncate_number with to_decimal_truncated for more precise decimal handling
  • Optimized _check_sum_equals_one method using pandas vectorized operations instead of row-by-row iteration
  • Enhanced proportionality model to validate column naming patterns and provide clearer error messages
  • Added PRECISION_DECIMAL_PLACE_TRUNCATE configuration constant

Reviewed changes

Copilot reviewed 12 out of 21 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
data_validate/helpers/common/formatting/number_formatting.py Added three new decimal validation functions; removed old truncate_number function
tests/unit/helpers/common/formatting/test_number_formatting.py Removed tests for truncate_number but did not add tests for new functions
data_validate/validators/spreadsheets/proportionality/proportionality_validator.py Refactored validation logic to use vectorized operations; improved performance and error reporting
data_validate/models/sp_proportionality.py Enhanced column structure validation with pattern matching and improved error messages
data_validate/helpers/common/validation/legend_processing.py Updated to use check_two_decimals_places instead of has_excessive_decimals
data_validate/helpers/common/validation/value_data_validation.py Updated to use check_two_decimals_places instead of has_excessive_decimals
data_validate/config/config.py Added PRECISION_DECIMAL_PLACE_TRUNCATE constant
data_validate/helpers/base/metadata_info.py Updated serial number from 667 to 671
README.md Updated version badge
Output files (HTML reports, PDFs, badges) Reflect updated error counts and test results
Comments suppressed due to low confidence (2)

data_validate/helpers/common/formatting/number_formatting.py:20

  • Except block directly handles BaseException.
    except:

data_validate/validators/spreadsheets/proportionality/proportionality_validator.py:230

  • Except block directly handles BaseException.
                            except:

Comment on lines +23 to +27
def check_n_decimals_places(value_number, value_to_ignore, number_decimal_places):
if pd.isna(value_number) or value_number == value_to_ignore:
return False
decimal_value = Decimal(str(value_number).replace(",", "."))
return decimal_value.as_tuple().exponent < -number_decimal_places
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_n_decimals_places function lacks parameter type hints and a docstring. This function is used in critical validation logic and should have clear documentation about its purpose, parameters (especially what value_to_ignore represents), and return value.

Copilot uses AI. Check for mistakes.
errors.append(
f"{self.sp_name_proportionality}: A soma de fatores influenciadores para o ID '{row_id}' no pai '{col}' é 0 (zero). Na planilha {self.sp_name_value}, existe(m) valor(es) para os filhos do indicador '{col}', no mesmo ID, que não é (são) zero ou DI (Dado Indisponível)."
)
except:
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except: clause should be replaced with specific exception handling. This catches all exceptions including SystemExit and KeyboardInterrupt. Consider catching specific exceptions like (ValueError, TypeError) instead to handle only the expected conversion errors.

Suggested change
except:
except (ValueError, TypeError):

Copilot uses AI. Check for mistakes.
# Test large numbers
result = truncate_number(123.456789, 1)
assert result == 123.4
assert msg == "O valor '3.0' é menor que o valor mínimo permitido (5)." No newline at end of file
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the test_truncate_number test method reduces test coverage for the module. While the truncate_number function has been replaced with to_decimal_truncated, the new function is not being tested in this file. According to the project guidelines, 100% coverage is mandatory for any new test file created, and changes that reduce coverage should include corresponding new tests for replacement functions.

Copilot uses AI. Check for mistakes.

__, level_2_codes_not_matched_by_pattern = categorize_strings_by_id_pattern_from_list(unique_columns_level_2, self.scenarios_list)

if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic at line 63 has a redundant condition. The check and not level_1_codes_not_matched_by_pattern is unnecessary because this block is already inside the else clause of if level_1_codes_not_matched_by_pattern: (line 53), which guarantees that level_1_codes_not_matched_by_pattern is empty or falsy. This condition can be simplified to just if level_2_codes_not_matched_by_pattern:.

Suggested change
if level_2_codes_not_matched_by_pattern and not level_1_codes_not_matched_by_pattern:
if level_2_codes_not_matched_by_pattern:

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
except:
return Decimal("0")
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_decimal_truncated function uses a bare except: clause which is considered a bad practice. This catches all exceptions including SystemExit and KeyboardInterrupt, which should not be suppressed. Consider catching specific exceptions like (ValueError, InvalidOperation, TypeError) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +21
def to_decimal_truncated(value_number, value_to_ignore, precision):
if pd.isna(value_number) or value_number == value_to_ignore:
return Decimal("0")

def has_excessive_decimals(value) -> bool:
if value in [float("-inf"), float("inf")] or pd.isna(value):
s_val = str(value_number).replace(",", ".")
try:
if "." in s_val:
integer_part, decimal_part = s_val.split(".")
truncated_val = f"{integer_part}.{decimal_part[:precision]}"
else:
truncated_val = s_val
return Decimal(truncated_val)
except:
return Decimal("0")
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_decimal_truncated function lacks parameter type hints and a docstring. According to the project's coding guidelines, all functions should have proper documentation explaining their purpose, parameters, and return values. This is especially important for utility functions that are used across multiple modules.

Copilot uses AI. Check for mistakes.
@@ -1,5 +1,5 @@
# Copyright (c) 2025 Mário Carvalho (https://github.com/MarioCarvalhoBr).
from decimal import Decimal
from decimal import Decimal, ROUND_DOWN
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'ROUND_DOWN' is not used.

Suggested change
from decimal import Decimal, ROUND_DOWN
from decimal import Decimal

Copilot uses AI. Check for mistakes.
errors.append(
f"{self.sp_name_proportionality}: A soma de fatores influenciadores para o ID '{row_id}' no pai '{col}' é 0 (zero). Na planilha {self.sp_name_value}, existe(m) valor(es) para os filhos do indicador '{col}', no mesmo ID, que não é (são) zero ou DI (Dado Indisponível)."
)
except:
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except:
except (ValueError, TypeError):
# Ignore values that cannot be converted to float (e.g., malformed or non-numeric data)

Copilot uses AI. Check for mistakes.
@pedro-andrade-inpe pedro-andrade-inpe merged commit d37b91b into AdaptaBrasil:main Dec 5, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants