You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/08/08 00:32:26 UTC

[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/22032

    [SPARK-25047][ML] Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel

    ## What changes were proposed in this pull request?
    
    Convert two function fields in ML classes to simple functions to avoi…d odd SerializedLambda deserialization problem
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-25047

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22032
    
----
commit 9fa90804f7216898d31cc83d477a39686df40bde
Author: Sean Owen <sr...@...>
Date:   2018-08-08T00:31:28Z

    Convert two function fields in ML classes to simple functions to avoid odd SerializedLambda deserialization problem

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94433/testReport)** for PR 22032 at commit [`120021c`](https://github.com/apache/spark/commit/120021c5f6cd1d37636770df23f9869b83e533e6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22032#discussion_r208642160
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
    @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
     
       override def transform(dataset: Dataset[_]): DataFrame = {
         transformSchema(dataset.schema, logging = true)
    -    val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT))
    +    val transformUDF = udf({ v: Vector => hashFunction(v) },
    --- End diff --
    
    Ah I see, let me try that instead. I didn't know you could express a type on `_` placeholder args.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1950/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1930/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22032


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94400/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94400/testReport)** for PR 22032 at commit [`e76cd81`](https://github.com/apache/spark/commit/e76cd810e57a0c7dcee53cc0ed5daa2a5308c59f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1932/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22032#discussion_r208425588
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
    @@ -75,7 +75,7 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
        * The hash function of LSH, mapping an input feature vector to multiple hash vectors.
        * @return The mapping of LSH function.
        */
    -  protected[ml] val hashFunction: Vector => Array[Vector]
    +  protected[ml] def hashFunction(elems: Vector): Array[Vector]
    --- End diff --
    
    This change does appear to resolve the issue by avoiding whatever is happening in these two cases. This is at least reasonable, as it sort of appears it's something to do with scala + Java 8 rather than Spark.
    
    However I do wonder whether MiMa will allow this change. It is still exposed as a function in the bytecode, so maybe. If not, will be a tough call whether this experimental class is allowed to change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22032#discussion_r208608273
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
    @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
     
       override def transform(dataset: Dataset[_]): DataFrame = {
         transformSchema(dataset.schema, logging = true)
    -    val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT))
    +    val transformUDF = udf({ v: Vector => hashFunction(v) },
    --- End diff --
    
    Yeah, that's what I had. It didn't compile in Scala 2.11 for some reason (see the "fails to build" build above). This seemed to work though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22032#discussion_r208599920
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
    @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
     
       override def transform(dataset: Dataset[_]): DataFrame = {
         transformSchema(dataset.schema, logging = true)
    -    val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT))
    +    val transformUDF = udf({ v: Vector => hashFunction(v) },
    --- End diff --
    
    nit: why not `hashFunction _`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94400/testReport)** for PR 22032 at commit [`e76cd81`](https://github.com/apache/spark/commit/e76cd810e57a0c7dcee53cc0ed5daa2a5308c59f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94433/testReport)** for PR 22032 at commit [`120021c`](https://github.com/apache/spark/commit/120021c5f6cd1d37636770df23f9869b83e533e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94398/testReport)** for PR 22032 at commit [`9fa9080`](https://github.com/apache/spark/commit/9fa90804f7216898d31cc83d477a39686df40bde).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    **[Test build #94398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94398/testReport)** for PR 22032 at commit [`9fa9080`](https://github.com/apache/spark/commit/9fa90804f7216898d31cc83d477a39686df40bde).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94398/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22032: [SPARK-25047][ML] Can't assign SerializedLambda t...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22032#discussion_r208637385
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
    @@ -97,7 +97,8 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
     
       override def transform(dataset: Dataset[_]): DataFrame = {
         transformSchema(dataset.schema, logging = true)
    -    val transformUDF = udf(hashFunction, DataTypes.createArrayType(new VectorUDT))
    +    val transformUDF = udf({ v: Vector => hashFunction(v) },
    --- End diff --
    
    not really, you had `hashFunction(_)` which doesn't work. `hashFunction(_: Vector)` or `hashFunction _` should work instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22032: [SPARK-25047][ML] Can't assign SerializedLambda to scala...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22032
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94433/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org