You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Zhang (JIRA)" <ji...@apache.org> on 2018/01/05 18:40:00 UTC
[jira] [Created] (SPARK-22974) CountVectorModel does not attach
attributes to output column
William Zhang created SPARK-22974:
-------------------------------------
Summary: CountVectorModel does not attach attributes to output column
Key: SPARK-22974
URL: https://issues.apache.org/jira/browse/SPARK-22974
Project: Spark
Issue Type: Bug
Components: ML
Affects Versions: 2.2.1
Reporter: William Zhang
If CountVectorModel transforms columns, the output column will not have attributes attached to them. If later on, those columns are used in Interaction transformer, an exception will be thrown:
{quote}"org.apache.spark.SparkException: Vector attributes must be defined for interaction."
{quote}
To reproduce it:
{{import org.apache.spark.ml.feature._
import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.{SparseVector, Vector}
val df = spark.createDataFrame(Seq(
(0, Array("a", "b", "c"), Array("1", "2")),
(1, Array("a", "b", "b", "c", "a", "d"), Array("1", "2", "3"))
)).toDF("id", "words", "nums")
val cvModel: CountVectorizerModel = new CountVectorizer()
.setInputCol("nums")
.setOutputCol("features2")
.setVocabSize(4)
.setMinDF(0)
.fit(df)
]val cvm = new CountVectorizerModel(Array("a", "b", "c"))
.setInputCol("words")
.setOutputCol("features1")
val df1 = cvm.transform(df)
val df2 = cvModel.transform(df1)
val interaction = new Interaction().setInputCols(Array("features1", "features2")).setOutputCol("features")
val df3 = interaction.transform(df2)}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org