You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Zhang (JIRA)" <ji...@apache.org> on 2018/01/05 18:40:00 UTC

[jira] [Created] (SPARK-22974) CountVectorModel does not attach attributes to output column

William Zhang created SPARK-22974:
-------------------------------------

             Summary: CountVectorModel does not attach attributes to output column
                 Key: SPARK-22974
                 URL: https://issues.apache.org/jira/browse/SPARK-22974
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.2.1
            Reporter: William Zhang


If CountVectorModel transforms columns, the output column will not have attributes attached to them. If later on, those columns are used in Interaction transformer, an exception will be thrown:
{quote}"org.apache.spark.SparkException: Vector attributes must be defined for interaction."
{quote}

To reproduce it:
{{import org.apache.spark.ml.feature._
import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.{SparseVector, Vector}

val df = spark.createDataFrame(Seq(
  (0, Array("a", "b", "c"), Array("1", "2")),
  (1, Array("a", "b", "b", "c", "a", "d"),  Array("1", "2", "3"))
)).toDF("id", "words", "nums")

val cvModel: CountVectorizerModel = new CountVectorizer()
  .setInputCol("nums")
  .setOutputCol("features2")
  .setVocabSize(4)
  .setMinDF(0)
  .fit(df)

]val cvm = new CountVectorizerModel(Array("a", "b", "c"))
  .setInputCol("words")
  .setOutputCol("features1")
  

val df1 = cvm.transform(df)
val df2 = cvModel.transform(df1)

val interaction = new Interaction().setInputCols(Array("features1", "features2")).setOutputCol("features")
val df3  = interaction.transform(df2)}}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org