[Bug]: AutoML time series: XGBoost prediction fails with “feature names should match” (date column)

### Describe the bug

Hi everyone,
I’m trying AutoML for time series forecasting for the first time and I’m stuck on an error. Any help would be greatly appreciated!

I trained two models for two different target variables and now want to predict the next three months for each.
Because I couldn’t train on daily data, I aggregated dates to the first day of each month.
Both the training and prediction tables come from the same file, and their column names match.
The model with the Extra Trees learner runs predictions without issues.
However, the XGBoost model fails with:

ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time: - date

What’s confusing:

The MLflow model signature shows date as an expected input.
I tried both date-only and datetime formats for the column, but the error persists.

Any ideas on how to resolve this feature-name mismatch for XGBoost (especially around the date column) would be amazing.
Thank you!


### Steps to reproduce


df = spark.read.format("delta").load(
    "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse-id>/Tables/<table>"
)

model = MLFlowTransformer(
    inputCols=[ "date", "feature_1", "feature_2", ... ],
    outputCol="target_prediction",
    modelName="<model-name>",
    modelVersion=<version>
)
df = model.transform(df)

df.write.format('delta').mode("overwrite").save(
    "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse-id>/Tables/<output-table>"
)

**Error Message:** 


File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/mlflow/pyfunc/__init__.py", line 716, in predict
    return self._predict_fn(data, params=params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/mlflow/sklearn/__init__.py", line 543, in predict
    return self.sklearn_model.predict(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/pipeline.py", line 600, in predict
    Xt = transform.transform(Xt)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/flaml/fabric/autofe.py", line 444, in transform
    return self._transform(X)
           ^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/flaml/fabric/autofe.py", line 403, in _transform
    raw_res = self.pipeline.transform(X)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/pipeline.py", line 903, in transform
    Xt = transform.transform(Xt, **routed_params[name].transform)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 313, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/decomposition/_base.py", line 143, in transform
    X = self._validate_data(
        ^^^^^^^^^^^^^^^^^^^^
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/base.py", line 608, in _validate_data
    self._check_feature_names(X, reset=reset)
  File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/base.py", line 535, in _check_feature_names
    raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- date


	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:572)
	at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:121)
	at org.apache.spark.sql.delta.files.DeltaFileFormatWriter$.$anonfun$executeTask$3(DeltaFileFormatWriter.scala:603)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
	at org.apache.spark.sql.delta.files.DeltaFileFormatWriter$.executeTask(DeltaFileFormatWriter.scala:611)
	... 12 more


### Model Used

_No response_

### Expected Behavior

_No response_

### Screenshots and logs

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: AutoML time series: XGBoost prediction fails with “feature names should match” (date column) #1480

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: AutoML time series: XGBoost prediction fails with “feature names should match” (date column) #1480

Description

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions