You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2018/01/05 10:23:00 UTC
[jira] [Created] (SPARK-22971) OneVsRestModel should use temporary
RawPredictionCol
zhengruifeng created SPARK-22971:
------------------------------------
Summary: OneVsRestModel should use temporary RawPredictionCol
Key: SPARK-22971
URL: https://issues.apache.org/jira/browse/SPARK-22971
Project: Spark
Issue Type: Bug
Components: ML
Affects Versions: 2.3.0
Reporter: zhengruifeng
Priority: Minor
Issue occurs when I transform one dataframe with two different classification models, first by a {{RandomForestClassificationModel}}, then a {{OneVsRestModel}}.
The first transform generate a new colum "rawPrediction", which will be internally used in {{OneVsRestModel#transform}} and cause failure.
{code}
scala> val df = spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_multiclass_classification_data.txt")
18/01/05 17:08:18 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
scala> val rf = new RandomForestClassifier()
rf: org.apache.spark.ml.classification.RandomForestClassifier = rfc_c11b1e1e1f7f
scala> val rfm = rf.fit(df)
rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = RandomForestClassificationModel (uid=rfc_c11b1e1e1f7f) with 20 trees
scala> val lr = new LogisticRegression().setMaxIter(1)
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_f5a5285eba06
scala> val ovr = new OneVsRest().setClassifier(lr)
ovr: org.apache.spark.ml.classification.OneVsRest = oneVsRest_8f5584190634
scala> val ovrModel = ovr.fit(df)
ovrModel: org.apache.spark.ml.classification.OneVsRestModel = oneVsRest_8f5584190634
scala> val df2 = rfm.setPredictionCol("rfPred").transform(df)
df2: org.apache.spark.sql.DataFrame = [label: double, features: vector ... 3 more fields]
scala> val df3 = ovrModel.setPredictionCol("ovrPred").transform(df2)
java.lang.IllegalArgumentException: requirement failed: Column rawPrediction already exists.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.appendColumn(SchemaUtils.scala:101)
at org.apache.spark.ml.util.SchemaUtils$.appendColumn(SchemaUtils.scala:91)
at org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:43)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel.org$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:77)
at org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37)
at org.apache.spark.ml.classification.LogisticRegressionModel.org$apache$spark$ml$classification$LogisticRegressionParams$$super$validateAndTransformSchema(LogisticRegression.scala:904)
at org.apache.spark.ml.classification.LogisticRegressionParams$class.validateAndTransformSchema(LogisticRegression.scala:265)
at org.apache.spark.ml.classification.LogisticRegressionModel.validateAndTransformSchema(LogisticRegression.scala:904)
at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel.transform(ProbabilisticClassifier.scala:104)
at org.apache.spark.ml.classification.OneVsRestModel$$anonfun$7.apply(OneVsRest.scala:184)
at org.apache.spark.ml.classification.OneVsRestModel$$anonfun$7.apply(OneVsRest.scala:173)
at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
at org.apache.spark.ml.classification.OneVsRestModel.transform(OneVsRest.scala:173)
... 50 elided
{code}
{{OneVsRestModel#transform}} only generates a new prediction column, and should not fail by other columns.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org