-
Notifications
You must be signed in to change notification settings - Fork 550
TextPredictor #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
TextPredictor #486
Conversation
Qiaochu-Song
commented
Mar 17, 2022
- Add an estimator for TextPredictor.
- Add a test for TextPredictor estimator.
liususan091219
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the training data passed through kwargs? It’s supposed to be passed from X_train
liususan091219
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename your estimator to MultiModalEstimator
flaml/ml.py
Outdated
| ARIMA, | ||
| SARIMAX, | ||
| TransformersEstimator, | ||
| AGTextPredictorEstimator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| AGTextPredictorEstimator, | |
| MultiModalEstimator, | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update all occurrences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the commit.
flaml/model.py
Outdated
| from autogluon.text import TextPredictor | ||
|
|
||
| super().__init__(task, **params) | ||
| self.estimator_class = TextPredictor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this and initialize the model with TextPredictor instead. Is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
flaml/model.py
Outdated
| } | ||
| return search_space_dict | ||
|
|
||
| def _init_fix_args(self, automl_fit_kwargs: dict=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this function? Can we simply remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have AGArgs dataclass in utils, and just use the default settings, we can remove this function, and just have self.ag_args=AGArgs() in MultimodalEstimator.fit(). Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can implement this, and define a similar init_hf_args if you need to check user input validity.
test/nlp/test_agtextpredictor.py
Outdated
| score = automl.model.estimator.evaluate(test_dataset) | ||
| print(f"Inference on test set complete, {metric}: {score}") | ||
| del automl | ||
| gc.collect() No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a breakline to the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/nlp/test_agtextpredictor.py
Outdated
| "gpu_per_trial": 0, | ||
| "max_iter": 2, | ||
| "time_budget": 50, | ||
| "task": "mm_multi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename mm_multi -> multimodal-classification
flaml/model.py
Outdated
| # train_data = self._kwargs["train_data"] | ||
| import pandas as pd | ||
| train_data = pd.concat([X_train, y_train], axis=1) | ||
| tuning_data = pd.concat([X_train, y_train], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean X_val, y_val?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this line since the tuning data is not necessary anymore.
flaml/model.py
Outdated
|
|
||
| self.fix_args = fix_args | ||
|
|
||
| def _init_hp_config(self, text_backbone: str, multimodal_fusion_strategy: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define cfg by defining a function inside of flaml/nlp/utils.py:class AGArgs, the remove this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This _init_hp_config is to use the AGArgs and the self.params to get the hyperparametersdiction for the TextPredictor. If removed, still need to assemble this diction inside the MultimodalEstimator.fit(). Do you think it is better without this function and have this part inside the .fit()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this function to a function inside of AGArgs because AGArgs is for managing the config for AG.
flaml/data.py
Outdated
| ) | ||
| SEQREGRESSION = "seq-regression" | ||
| REGRESSION = ("regression", SEQREGRESSION) | ||
| REGRESSION = ("regression", "mm_regression", SEQREGRESSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename "mm_regression" -> "multimodal-regression", define a static variable for it
flaml/data.py
Outdated
| SEQCLASSIFICATION, | ||
| MULTICHOICECLASSIFICATION, | ||
| TOKENCLASSIFICATION, | ||
| "mm_multi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you automatically detect "mm_multi" and "mm_binary" so we don't need these two values anymore?
flaml/model.py
Outdated
|
|
||
| # train_data = self._kwargs["train_data"] | ||
| import pandas as pd | ||
| train_data = pd.concat([X_train, y_train], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use estimator._join method. See TransformersEstimator._join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
flaml/ml.py
Outdated
| ARIMA, | ||
| SARIMAX, | ||
| TransformersEstimator, | ||
| AGTextPredictorEstimator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the commit.
flaml/model.py
Outdated
| from autogluon.text import TextPredictor | ||
|
|
||
| super().__init__(task, **params) | ||
| self.estimator_class = TextPredictor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
flaml/model.py
Outdated
| } | ||
| return search_space_dict | ||
|
|
||
| def _init_fix_args(self, automl_fit_kwargs: dict=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can implement this, and define a similar init_hf_args if you need to check user input validity.
flaml/model.py
Outdated
|
|
||
| self.fix_args = fix_args | ||
|
|
||
| def _init_hp_config(self, text_backbone: str, multimodal_fusion_strategy: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this function to a function inside of AGArgs because AGArgs is for managing the config for AG.
flaml/model.py
Outdated
| save_dir = self.fix_args["output_dir"] | ||
| label_column = self.fix_args["label_column"] | ||
| dataset_name = self.fix_args["dataset_name"] | ||
| ag_model_save_dir = os.path.join(save_dir, f"{dataset_name}_ag_text_multimodal_{text_backbone}\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Can you use the original directory save_dir instead of the modified directory ag_model_save_dir so users know where to find the saved model?