You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2018/02/27 18:44:18 UTC
[GitHub] spark pull request #18998: [SPARK-21748][ML] Migrate the implementation of H...
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/18998#discussion_r171025256
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala ---
@@ -93,11 +97,21 @@ class HashingTF @Since("1.4.0") (@Since("1.4.0") override val uid: String)
@Since("2.0.0")
override def transform(dataset: Dataset[_]): DataFrame = {
val outputSchema = transformSchema(dataset.schema)
- val hashingTF = new feature.HashingTF($(numFeatures)).setBinary($(binary))
- // TODO: Make the hashingTF.transform natively in ml framework to avoid extra conversion.
- val t = udf { terms: Seq[_] => hashingTF.transform(terms).asML }
+ val hashUDF = udf { (terms: Seq[_]) =>
+ val ids = terms.map { term =>
--- End diff --
Why did you implement this differently than the old one?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org