-
Notifications
You must be signed in to change notification settings - Fork 73
feat: add tool calling argument validation #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add tool calling argument validation #364
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
@akihikokuroda looks like there's a test failure (and needs a rebase on main) |
|
otherwise, I think this LGTM |
98465c2 to
3337332
Compare
|
Saw that #380 was opened yesterday, so should confirm with @HendrikStrobelt that the approach here is compatible with that PR |
|
@psschwei Minor merges are necessary because same files are changed but they look compatible. |
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few questions / concerns.
| @pytest.mark.integration | ||
| class TestToolValidationIntegration: | ||
| """Integration tests that would use actual validation function.""" | ||
|
|
||
| @pytest.mark.skip(reason="Validation function not yet implemented") | ||
| def test_validation_with_coercion(self): | ||
| """Test validation with type coercion enabled.""" | ||
| # This test will be enabled once validation is implemented | ||
| pass | ||
|
|
||
| @pytest.mark.skip(reason="Validation function not yet implemented") | ||
| def test_validation_strict_mode(self): | ||
| """Test validation in strict mode.""" | ||
| # This test will be enabled once validation is implemented | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on what this means for these two tests? Is there additional validation logic that will be enabled later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take them out for now. Intention is these two skipped tests are placeholders for future integration tests that would verify the validate_tool_arguments function works correctly in real-world scenarios:
- test_validation_with_coercion
Purpose: Test that the validation function properly coerces types when coerce_types=True is enabled.
What it would test: When LLMs return tool arguments as strings (e.g., "30" instead of 30), this test would verify that the validation function automatically converts them to the correct types (int, float, bool, etc.) before passing to the tool function.
Example scenario: An LLM calls a tool with {"age": "30", "score": "95.5"} but the tool expects int and float. The validation should coerce these strings to proper numeric types.
- test_validation_strict_mode
Purpose: Test that strict validation mode properly raises errors when arguments don't match expected types.
What it would test: When strict=True, the validation function should raise ValidationError for type mismatches or missing required parameters, rather than silently returning the original arguments.
Example scenario: An LLM provides {"age": "not_a_number"} when an int is required. In strict mode, this should raise an error immediately rather than passing invalid data to the tool.
Current Status: Both tests are skipped because the validate_tool_arguments function implementation is not yet complete. However, the file contains extensive working tests (lines 91-356) that already cover these scenarios in detail, suggesting the validation function may actually be implemented. The skipped tests at the top appear to be legacy placeholders that should either be:
Removed (if functionality is already tested elsewhere)
Updated to test specific integration scenarios not covered by existing tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take them out for now. Intention is these two skipped tests are placeholders for future integration tests that would verify the validate_tool_arguments function works correctly in real-world scenarios:
test_validation_with_coercion
Purpose: Test that the validation function properly coerces types when coerce_types=True is enabled.
What it would test: When LLMs return tool arguments as strings (e.g., "30" instead of 30), this test would verify that the validation function automatically converts them to the correct types (int, float, bool, etc.) before passing to the tool function.Example scenario: An LLM calls a tool with {"age": "30", "score": "95.5"} but the tool expects int and float. The validation should coerce these strings to proper numeric types.
test_validation_strict_mode
Purpose: Test that strict validation mode properly raises errors when arguments don't match expected types.
What it would test: When strict=True, the validation function should raise ValidationError for type mismatches or missing required parameters, rather than silently returning the original arguments.Example scenario: An LLM provides {"age": "not_a_number"} when an int is required. In strict mode, this should raise an error immediately rather than passing invalid data to the tool.
Current Status: Both tests are skipped because the validate_tool_arguments function implementation is not yet complete. However, the file contains extensive working tests (lines 91-356) that already cover these scenarios in detail, suggesting the validation function may actually be implemented. The skipped tests at the top appear to be legacy placeholders that should either be:
Removed (if functionality is already tested elsewhere)
Updated to test specific integration scenarios not covered by existing tests
Okay, I think we should go ahead and remove these for now then. I believe our tests that check for tool call requests from a model also call those tools to ensure they function properly.
| # Validate and coerce argument types | ||
| validated_args = validate_tool_arguments(func, args, strict=False) | ||
| model_tool_calls[tool_name] = ModelToolCall( | ||
| tool_name, func, validated_args | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to verify that these backends can generate incorrect parameters? I know huggingface has some issues with its tool requests that could benefit from this, but I'm not sure the other backends need any validation (but that was an open question I wasn't certain on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if I saw the backend generated incorrect parameters but it is good to validate them before making tool calls. We can not assume all backends working correctly :-)
| def validate_tool_arguments( | ||
| func: Callable, | ||
| args: Mapping[str, Any], | ||
| *, | ||
| coerce_types: bool = True, | ||
| strict: bool = False, | ||
| ) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the changes for this PR and Hendrik's; if Hendrik's gets merged, first you'll just have to change this func: Callable to be a func: MelleaTool since not all of our tools going forward will be simple functions. Then the validation becomes a bit trickier but MelleaTool.as_json_tool should give you the parameter names and types that matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the PR. Right now, the parameter information is from the function signature only. I understand that some refactor is necessary.
b15c99c to
36745d8
Compare
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
7522672 to
2a0b2aa
Compare
Misc PR
Type of PR
Description
Testing