You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Junichi Koizumi (Jira)" <ji...@apache.org> on 2019/09/01 18:15:00 UTC
[jira] [Commented] (SPARK-28902) Spark ML Pipeline with nested
Pipelines fails to load when saved from Python
[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920478#comment-16920478 ]
Junichi Koizumi commented on SPARK-28902:
-------------------------------------------
Could you tell a little bit more about the workaround? It turns out to be fine on my version .
pyspark :
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import Tokenizer
>>> t = Tokenizer()
>>> p = Pipeline().setStages([t])
>>> d = spark.createDataFrame([["Apache spark logistic regression "]])
>>> pm = p.fit(d)
>>> np = Pipeline().setStages([pm])
>>> npm = np.fit(d)
>>> npm.write().save('./npm_test')
scala side :
scala> import org.apache.spark.ml.PipelineModel
import org.apache.spark.ml.PipelineModel
scala> val pp = PipelineModel.load("./npm_test")
pp: org.apache.spark.ml.PipelineModel = PipelineModel_4d879f6b2b02c8d3d467
> Spark ML Pipeline with nested Pipelines fails to load when saved from Python
> ----------------------------------------------------------------------------
>
> Key: SPARK-28902
> URL: https://issues.apache.org/jira/browse/SPARK-28902
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.4.3
> Reporter: Saif Addin
> Priority: Minor
>
> Hi, this error is affecting a bunch of our nested use cases.
> Saving a *PipelineModel* with one of its stages being another *PipelineModel*, fails when loading it from Scala if it is saved in Python.
> *Python side:*
>
> {code:java}
> from pyspark.ml import Pipeline
> from pyspark.ml.feature import Tokenizer
> t = Tokenizer()
> p = Pipeline().setStages([t])
> d = spark.createDataFrame([["Hello Peter Parker"]])
> pm = p.fit(d)
> np = Pipeline().setStages([pm])
> npm = np.fit(d)
> npm.write().save('./npm_test')
> {code}
>
>
> *Scala side:*
>
> {code:java}
> scala> import org.apache.spark.ml.PipelineModel
> scala> val pp = PipelineModel.load("./npm_test")
> java.lang.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.PipelineModel but found class name pyspark.ml.pipeline.PipelineModel
> at scala.Predef$.require(Predef.scala:224)
> at org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638)
> at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616)
> at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267)
> at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
> at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
> at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380)
> at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332)
> ... 50 elided
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org