You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by PowerToThePeople111 <gi...@git.apache.org> on 2018/08/09 15:34:08 UTC

[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

Github user PowerToThePeople111 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20313#discussion_r208977114
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala ---
    @@ -264,7 +265,9 @@ class CountVectorizerModel(
     
           Vectors.sparse(dictBr.value.size, effectiveCounts)
         }
    -    dataset.withColumn($(outputCol), vectorizer(col($(inputCol))))
    +    val attrs = vocabulary.map(_ => new NumericAttribute).asInstanceOf[Array[Attribute]]
    --- End diff --
    
    I do not think, that the information is totally useless: if you want to know which feature-vector-index (created by a CountVectorizer) corresponds to which LR coefficient for example is very helpful. It should in general be possible to actually easily get this information given an arbitrary vector which was created by properly implemented feature-generation-transformer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org