You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Junichi Koizumi (Jira)" <ji...@apache.org> on 2019/09/10 22:13:00 UTC

[jira] [Issue Comment Deleted] (SPARK-28902) Spark ML Pipeline with nested Pipelines fails to load when saved from Python

     [ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Junichi  Koizumi  updated SPARK-28902:
--------------------------------------
    Comment: was deleted

(was:   Since, versions aren't the main concern here should I create a PR ? )

> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-28902
>                 URL: https://issues.apache.org/jira/browse/SPARK-28902
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.3
>            Reporter: Saif Addin
>            Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>  
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>  
>  
> *Scala side:*
>  
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel
>  at scala.Predef$.require(Predef.scala:224)
>  at org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
>  at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
>  at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
>  at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
>  at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
>  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
>  at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
>  ... 50 elided
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org