-
Notifications
You must be signed in to change notification settings - Fork 550
Description
Describe the bug
Hi everyone,
I’m trying AutoML for time series forecasting for the first time and I’m stuck on an error. Any help would be greatly appreciated!
I trained two models for two different target variables and now want to predict the next three months for each.
Because I couldn’t train on daily data, I aggregated dates to the first day of each month.
Both the training and prediction tables come from the same file, and their column names match.
The model with the Extra Trees learner runs predictions without issues.
However, the XGBoost model fails with:
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time: - date
What’s confusing:
The MLflow model signature shows date as an expected input.
I tried both date-only and datetime formats for the column, but the error persists.
Any ideas on how to resolve this feature-name mismatch for XGBoost (especially around the date column) would be amazing.
Thank you!
Steps to reproduce
df = spark.read.format("delta").load(
"abfss://@onelake.dfs.fabric.microsoft.com//Tables/
)
model = MLFlowTransformer(
inputCols=[ "date", "feature_1", "feature_2", ... ],
outputCol="target_prediction",
modelName="",
modelVersion=
)
df = model.transform(df)
df.write.format('delta').mode("overwrite").save(
"abfss://@onelake.dfs.fabric.microsoft.com//Tables/"
)
Error Message:
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/mlflow/pyfunc/init.py", line 716, in predict
return self._predict_fn(data, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/mlflow/sklearn/init.py", line 543, in predict
return self.sklearn_model.predict(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/pipeline.py", line 600, in predict
Xt = transform.transform(Xt)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/flaml/fabric/autofe.py", line 444, in transform
return self._transform(X)
^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/flaml/fabric/autofe.py", line 403, in _transform
raw_res = self.pipeline.transform(X)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/pipeline.py", line 903, in transform
Xt = transform.transform(Xt, **routed_params[name].transform)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/decomposition/_base.py", line 143, in transform
X = self._validate_data(
^^^^^^^^^^^^^^^^^^^^
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/base.py", line 608, in _validate_data
self._check_feature_names(X, reset=reset)
File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/base.py", line 535, in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
-
date
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:572)
at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:121)
at org.apache.spark.sql.delta.files.DeltaFileFormatWriter$.$anonfun$executeTask$3(DeltaFileFormatWriter.scala:603)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
at org.apache.spark.sql.delta.files.DeltaFileFormatWriter$.executeTask(DeltaFileFormatWriter.scala:611)
... 12 more
Model Used
No response
Expected Behavior
No response
Screenshots and logs
No response
Additional Information
No response