-
Notifications
You must be signed in to change notification settings - Fork 550
features: Manually setting the validation set for multi-output task #1302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree |
prdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for manually setting a validation set for multi-output tasks when using the "holdout" evaluation method. Previously, users could not manually specify a validation set for multi-output regression tasks. The new multioutput_train_size parameter allows users to concatenate training and validation data and specify where to split them.
Changes:
- Added
multioutput_train_sizeparameter to AutoML class for manual validation set specification - Implemented
_train_val_splitmethod to split concatenated training/validation data - Added test case demonstrating the new functionality with MultiOutputRegressor
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| flaml/automl/automl.py | Added documentation and implementation for the multioutput_train_size parameter, including the split logic in the fit method |
| test/automl/test_regression.py | Added test_multioutput_train_size function to demonstrate usage of the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_multioutput_train_size(): | ||
| import numpy as np | ||
| from sklearn.datasets import make_regression | ||
| from sklearn.model_selection import train_test_split | ||
| from sklearn.multioutput import MultiOutputRegressor, RegressorChain | ||
|
|
||
| # create regression data | ||
| X, y = make_regression(n_targets=3) | ||
|
|
||
| # split into train and test data | ||
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) | ||
| X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42) | ||
|
|
||
| # train the model | ||
| model = MultiOutputRegressor( | ||
| AutoML(task="regression", time_budget=1, eval_method="holdout", multioutput_train_size=len(X_train)) | ||
| ) | ||
| model.fit(np.concatenate([X_train, X_val], axis=0), np.concatenate([y_train, y_val], axis=0)) | ||
|
|
||
| # predict | ||
| print(model.predict(X_test)) |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function lacks assertions to verify the new multioutput_train_size feature works as expected. Consider adding assertions to validate that the model was trained successfully and that the validation split was performed correctly. For example, you could check that the model produces reasonable predictions or verify internal state that confirms the train/validation split occurred.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Why are these changes needed?
For original multi-output tasks where the eval_method is holdout, manual setting of the validation set was not possible. This commit introduces a new feature allowing manual setting of the validation set for multi-output tasks.
Related issue number
Checks