You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/08 16:40:55 UTC

[GitHub] [spark] huaxingao opened a new pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

huaxingao opened a new pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501
 
 
   
   ### What changes were proposed in this pull request?
   Add ```HasBlockSize``` in shared Params in both Scala and Python.
   Make ALS/MLP extend ```HasBlockSize```
   
   
   ### Why are the changes needed?
   Add ```HasBlockSize ``` in ALS, so user can specify the blockSize.
   Make ```HasBlockSize``` a shared param so both ALS and MLP can use it.
   
   
   ### Does this PR introduce any user-facing change?
   Yes
   ```ALS.setBlockSize/getBlockSize```
   ```ALSModel.setBlockSize/getBlockSize```
   
   
   ### How was this patch tested?
   Manually tested. Also added doctest.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583759917
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753822
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22838/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583767572
 
 
   cc @srowen @zhengruifeng 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#discussion_r376720051
 
 

 ##########
 File path: python/pyspark/ml/param/_shared_params_code_gen.py
 ##########
 @@ -164,7 +164,10 @@ def get$Name(self):
          "'euclidean'", "TypeConverters.toString"),
         ("validationIndicatorCol", "name of the column that indicates whether each row is for " +
          "training or for validation. False indicates training; true indicates validation.",
-         None, "TypeConverters.toString")]
+         None, "TypeConverters.toString"),
+        ("blockSize", "block size for stacking input data in matrices. Data is stacked within "
+         "partitions. If block size is more than remaining data in a partition then it is "
+         "adjusted to the size of this data.", None, "TypeConverters.toInt")]
 
 Review comment:
   Same as Scala, I will not set default value here. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753821
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583759841
 
 
   **[Test build #118073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118073/testReport)** for PR 27501 at commit [`ac6b55d`](https://github.com/apache/spark/commit/ac6b55d9e344f420eacc3ca6dae7578f3b6301bb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `trait HasBlockSize extends Params `
     * `class HasBlockSize(Params):`
     * `class _ALSModelParams(HasPredictionCol, HasBlockSize):`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753699
 
 
   **[Test build #118073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118073/testReport)** for PR 27501 at commit [`ac6b55d`](https://github.com/apache/spark/commit/ac6b55d9e344f420eacc3ca6dae7578f3b6301bb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on a change in pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
huaxingao commented on a change in pull request #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#discussion_r376720035
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 ##########
 @@ -104,7 +104,11 @@ private[shared] object SharedParamsCodeGen {
         isValid = "ParamValidators.inArray(Array(\"euclidean\", \"cosine\"))"),
       ParamDesc[String]("validationIndicatorCol", "name of the column that indicates whether " +
         "each row is for training or for validation. False indicates training; true indicates " +
-        "validation.")
+        "validation."),
+      ParamDesc[Int]("blockSize", "block size for stacking input data in matrices. Data is " +
+        "stacked within partitions. If block size is more than remaining data in a partition " +
+        "then it is adjusted to the size of this data.",
 
 Review comment:
   I will not set the default value here. The default value will be set in each of the class that extends ```HasBlockSize```. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583759918
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118073/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583759918
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118073/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583805890
 
 
   Merged to master. Thanks @huaxingao 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753821
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753699
 
 
   **[Test build #118073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118073/testReport)** for PR 27501 at commit [`ac6b55d`](https://github.com/apache/spark/commit/ac6b55d9e344f420eacc3ca6dae7578f3b6301bb).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583759917
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27501: [SPARK-30662][ML][PySpark] Put back the API changes for HasBlockSize in ALS/MLP
URL: https://github.com/apache/spark/pull/27501#issuecomment-583753822
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22838/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org