You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2023/01/09 12:50:00 UTC
[jira] [Created] (SPARK-41950) mlflow doctest fails for pandas API on SPark
Hyukjin Kwon created SPARK-41950:
------------------------------------
Summary: mlflow doctest fails for pandas API on SPark
Key: SPARK-41950
URL: https://issues.apache.org/jira/browse/SPARK-41950
Project: Spark
Issue Type: Bug
Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon
{code}
File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 172, in pyspark.pandas.mlflow.load_model
Failed example:
prediction_df
Exception raised:
Traceback (most recent call last):
File "/usr/lib/python3.9/doctest.py", line 1336, in __run
exec(compile(example.source, filename, "single",
File "<doctest pyspark.pandas.mlflow.load_model[18]>", line 1, in <module>
prediction_df
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13322, in __repr__
pdf = cast("DataFrame", self._get_or_create_repr_pandas_cache(max_display_count))
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13313, in _get_or_create_repr_pandas_cache
self, "_repr_pandas_cache", {n: self.head(n + 1)._to_internal_pandas()}
File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13308, in _to_internal_pandas
return self._internal.to_pandas_frame
File "/__w/spark/spark/python/pyspark/pandas/utils.py", line 588, in wrapped_lazy_property
setattr(self, attr_name, fn(self))
File "/__w/spark/spark/python/pyspark/pandas/internal.py", line 1056, in to_pandas_frame
pdf = sdf.toPandas()
File "/__w/spark/spark/python/pyspark/sql/pandas/conversion.py", line 208, in toPandas
pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)
File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 1197, in collect
sock_info = self._jdf.collectToPython()
File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/__w/spark/spark/python/pyspark/sql/utils.py", line 209, in deco
raise converted from None
pyspark.sql.utils.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 829, in main
process()
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 821, in process
serializer.dump_stream(out_iter, outfile)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 345, in dump_stream
return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream)
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 86, in dump_stream
for batch in iterator:
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", line 338, in init_stream_yield_batches
for series in iterator:
File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", line 519, in func
for result_batch, result_type in result_iter:
File "/usr/local/lib/python3.9/dist-packages/mlflow/pyfunc/__init__.py", line 1253, in udf
yield _predict_row_batch(batch_predict_fn, row_batch_args)
File "/usr/local/lib/python3.9/dist-packages/mlflow/pyfunc/__init__.py", line 1057, in _predict_row_batch
result = predict_fn(pdf)
File "/usr/local/lib/python3.9/dist-packages/mlflow/pyfunc/__init__.py", line 1237, in batch_predict_fn
return loaded_model.predict(pdf)
File "/usr/local/lib/python3.9/dist-packages/mlflow/pyfunc/__init__.py", line 413, in predict
return self._predict_fn(data)
File "/usr/local/lib/python3.9/dist-packages/sklearn/linear_model/_base.py", line 355, in predict
return self._decision_function(X)
File "/usr/local/lib/python3.9/dist-packages/sklearn/linear_model/_base.py", line 338, in _decision_function
X = self._validate_data(X, accept_sparse=["csr", "csc", "coo"], reset=False)
File "/usr/local/lib/python3.9/dist-packages/sklearn/base.py", line 518, in _validate_data
self._check_feature_names(X, reset=reset)
File "/usr/local/lib/python3.9/dist-packages/sklearn/base.py", line 451, in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- 0
- 1
Feature names seen at fit time, yet now missing:
- x1
- x2
{code}
https://github.com/apache/spark/actions/runs/3871715040/jobs/6600578830
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org