You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2018/02/27 18:44:18 UTC

[GitHub] spark pull request #18998: [SPARK-21748][ML] Migrate the implementation of H...

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18998#discussion_r171025256
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala ---
    @@ -93,11 +97,21 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String)
       @Since("2.0.0")
       override def transform(dataset: Dataset[_]): DataFrame = {
         val outputSchema = transformSchema(dataset.schema)
    -    val hashingTF = new feature.HashingTF($(numFeatures)).setBinary($(binary))
    -    // TODO: Make the hashingTF.transform natively in ml framework to avoid extra conversion.
    -    val t = udf { terms: Seq[_] => hashingTF.transform(terms).asML }
    +    val hashUDF = udf { (terms: Seq[_]) =>
    +      val ids = terms.map { term =>
    --- End diff --
    
    Why did you implement this differently than the old one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org