You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2018/05/08 22:03:51 UTC

[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/21274

    [SPARK-24213][ML] Fix for Int id type for PowerIterationClustering in spark.ml

    ## What changes were proposed in this pull request?
    
    PIC in spark.ml has tests for "id" type IntegerType, but those tests did not fully check the result.  This patch:
    * fixes the unit test to break for the existing implementation
    * fixes the implementation
    
    ## How was this patch tested?
    
    Existing unit tests, with fixes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark SPARK-24213

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21274
    
----
commit cd02df5a995ac7daa2e76dcc13c54b987265b6e4
Author: Joseph K. Bradley <jo...@...>
Date:   2018-05-08T22:02:33Z

    fix for Int id type for PowerIterationClustering in spark.ml

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90420/testReport)** for PR 21274 at commit [`e504c3e`](https://github.com/apache/spark/commit/e504c3e8172f06b9985a9987fa150ec26bde006a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3077/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90419/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley closed the pull request at:

    https://github.com/apache/spark/pull/21274


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90419/testReport)** for PR 21274 at commit [`e504c3e`](https://github.com/apache/spark/commit/e504c3e8172f06b9985a9987fa150ec26bde006a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90389/testReport)** for PR 21274 at commit [`cd02df5`](https://github.com/apache/spark/commit/cd02df5a995ac7daa2e76dcc13c54b987265b6e4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90389/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90427/testReport)** for PR 21274 at commit [`3e40a92`](https://github.com/apache/spark/commit/3e40a92f4d80a5a455cd519d29a8f3c229a3f4d0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21274#discussion_r187159250
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala ---
    @@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] (
           dataset.schema($(idCol)).dataType match {
             case _: LongType =>
               uncastPredictions
    +        case _: IntegerType =>
    +          uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType))
             case otherType =>
    -          uncastPredictions.select(col($(idCol)).cast(otherType).alias($(idCol)))
    +          throw new IllegalArgumentException(s"PowerIterationClustering had an unexpected error: " +
    +            s"ID col was found to be of type $otherType, despite initial schema checks.  Please " +
    --- End diff --
    
    nit: ${otherType.simpleString}


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21274#discussion_r187234165
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala ---
    @@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] (
           dataset.schema($(idCol)).dataType match {
             case _: LongType =>
               uncastPredictions
    +        case _: IntegerType =>
    +          uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType))
    --- End diff --
    
    Shouldn't it be 
    ` case _: IntegerType =>
    +          uncastPredictions.withColumn($(idCol), col($(idCol)).cast(IntegerType))
    `
    Otherwise it is not necessary for casting. right? Because prediction already has id as Long type and dataset has id as IntegerType. So, we need to cast prediction.id to IntegerType. right?
    Correct me if I am wrong.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90427/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21274#discussion_r186986006
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala ---
    @@ -232,7 +232,7 @@ class PowerIterationClustering private[clustering] (
             case _: LongType =>
               uncastPredictions
             case otherType =>
    --- End diff --
    
    Why not directly use
    `case intType: IntType: ` so that make a stronger restriction ?
    Or do we need to support other types besides int/long ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90389/testReport)** for PR 21274 at commit [`cd02df5`](https://github.com/apache/spark/commit/cd02df5a995ac7daa2e76dcc13c54b987265b6e4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90420/testReport)** for PR 21274 at commit [`e504c3e`](https://github.com/apache/spark/commit/e504c3e8172f06b9985a9987fa150ec26bde006a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3055/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3079/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3085/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    LGTM. !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90427/testReport)** for PR 21274 at commit [`3e40a92`](https://github.com/apache/spark/commit/3e40a92f4d80a5a455cd519d29a8f3c229a3f4d0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    This issue actually brings up a problem with the Transformer approach for PIC.  Just commented more here: https://issues.apache.org/jira/browse/SPARK-15784
    
    Thank you for pushing back!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90420/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    **[Test build #90419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90419/testReport)** for PR 21274 at commit [`e504c3e`](https://github.com/apache/spark/commit/e504c3e8172f06b9985a9987fa150ec26bde006a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21274: [SPARK-24213][ML] Fix for Int id type for PowerIteration...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21274
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org