You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/18 07:20:26 UTC

[GitHub] [spark] zhengruifeng opened a new pull request #27944: [SPARK-31180][ML] Implement PowerTransform

zhengruifeng opened a new pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944
 
 
   ### What changes were proposed in this pull request?
   Power transforms are a family of parametric, monotonic transformationsthat are applied to make data more Gaussian-like. This is useful formodeling issues related to heteroscedasticity (non-constant variance),or other situations where normality is desired.
   
   
   ### Why are the changes needed?
   It is widely used and suitable to be implemented atop spark
   
   ### Does this PR introduce any user-facing change?
   Yes
   
   
   ### How was this patch tested?
   added testsuites
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615236640
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121410/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600518846
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119978/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607137242
 
 
   **[Test build #120669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120669/testReport)** for PR 27944 at commit [`7bf9ec6`](https://github.com/apache/spark/commit/7bf9ec60b11132096b8056b836a0057a9f8d778f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600489500
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119972/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615230187
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600518846
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119978/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483701
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600487155
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600487160
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24701/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173509
 
 
   **[Test build #120675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120675/testReport)** for PR 27944 at commit [`b19c197`](https://github.com/apache/spark/commit/b19c19706aad1db60e30a2e5d18021eecc382175).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142387
 
 
   **[Test build #120673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120673/testReport)** for PR 27944 at commit [`c656d0f`](https://github.com/apache/spark/commit/c656d0f1d91e7153e555cfc5391e010ddcd620f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24700/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607137761
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607137768
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120669/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607139233
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615236629
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607127258
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615229677
 
 
   **[Test build #121410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121410/testReport)** for PR 27944 at commit [`5c64d90`](https://github.com/apache/spark/commit/5c64d90c3dfbec11738296047474189f71cc8f76).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600465505
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24695/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615236629
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607206293
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120675/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615230193
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607163070
 
 
   **[Test build #120671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120671/testReport)** for PR 27944 at commit [`66a97c9`](https://github.com/apache/spark/commit/66a97c955d4c10b9a2dfb5a07ce35b266af8b357).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607102508
 
 
   **[Test build #120669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120669/testReport)** for PR 27944 at commit [`7bf9ec6`](https://github.com/apache/spark/commit/7bf9ec60b11132096b8056b836a0057a9f8d778f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173854
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25374/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607103073
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25368/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173842
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607174590
 
 
   **[Test build #120672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120672/testReport)** for PR 27944 at commit [`c574fcb`](https://github.com/apache/spark/commit/c574fcbe64246aca2aaf7790222a17f1e85b7db4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483701
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600464979
 
 
   **[Test build #119972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119972/testReport)** for PR 27944 at commit [`932caff`](https://github.com/apache/spark/commit/932caff9a5e47f3f3f8a99d20599bba4f9554d71).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600517425
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607131562
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25370/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607127268
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25369/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142878
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25372/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r401524503
 
 

 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/feature/PowerTransformSuite.scala
 ##########
 @@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.util.Random
+
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.sql.Row
+
+class PowerTransformSuite extends MLTest with DefaultReadWriteTest {
+
+  import testImplicits._
+
+  @transient var data: Array[Vector] = _
+  @transient var dataWithNoise: Array[Vector] = _
+  @transient var resWithYeoJohnson: Array[Vector] = _
+  @transient var resWithBoxCox: Array[Vector] = _
+
+  private val seed = 42L
+
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+
+    data = Array(
+      Vectors.dense(1.28331718, 1.18092228, 0.84160269),
+      Vectors.dense(0.94293279, 1.60960836, 0.3879099),
+      Vectors.dense(1.35235668, 0.21715673, 1.09977091)
+    )
+
+    val rng = new Random(seed)
+    dataWithNoise = data.flatMap { vector =>
 
 Review comment:
   Up-sample the dataset with a small gaussian noise

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600516835
 
 
   **[Test build #119977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119977/testReport)** for PR 27944 at commit [`de7cc9b`](https://github.com/apache/spark/commit/de7cc9bbdcee0d05eab0fb67ee2584146d529592).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600465505
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24695/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607131562
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25370/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r401522135
 
 

 ##########
 File path: mllib/src/test/scala/org/apache/spark/ml/feature/PowerTransformSuite.scala
 ##########
 @@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.util.Random
+
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.sql.Row
+
+class PowerTransformSuite extends MLTest with DefaultReadWriteTest {
+
+  import testImplicits._
+
+  @transient var data: Array[Vector] = _
+  @transient var dataWithNoise: Array[Vector] = _
+  @transient var resWithYeoJohnson: Array[Vector] = _
+  @transient var resWithBoxCox: Array[Vector] = _
+
+  private val seed = 42L
+
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+
+    data = Array(
+      Vectors.dense(1.28331718, 1.18092228, 0.84160269),
+      Vectors.dense(0.94293279, 1.60960836, 0.3879099),
+      Vectors.dense(1.35235668, 0.21715673, 1.09977091)
+    )
+
+    val rng = new Random(seed)
+    dataWithNoise = data.flatMap { vector =>
+      val values = vector.toArray
+      Iterator.tabulate(1000) { i =>
+        val valuesWithNoise = values.map { v => v + rng.nextGaussian * 1e-6 }
+        Vectors.dense(valuesWithNoise)
+      }
+    }
+
+    resWithYeoJohnson = Array(
+      Vectors.dense(1.88609649e+02, 1.64321816e+00, 1.26875990e+00),
+      Vectors.dense(4.39620682e+01, 2.44852195e+00, 4.76941835e-01),
+      Vectors.dense(2.46706192e+02, 2.33862299e-01, 1.83629062e+00)
+    )
+
+    resWithBoxCox = Array(
+      Vectors.dense(0.49024348, 0.17881995, -0.15637811),
+      Vectors.dense(-0.05102892, 0.58863196, -0.57612414),
+      Vectors.dense(0.69420008, -0.84857822, 0.10051454)
+    )
+  }
+
+  test("params") {
+    ParamsSuite.checkParams(new PowerTransform)
+    val lambda = Vectors.dense(1, 0.5, 3)
+    val model = new PowerTransformModel("ptm", lambda)
+    ParamsSuite.checkParams(model)
+  }
+
+  private def assertResult: Row => Unit = {
+    case Row(vector1: Vector, vector2: Vector) =>
+      assert(vector1 ~== vector2 relTol 1E-5,
+        "The vector value is not correct after transformation.")
+  }
+
+  test("Yeo-Johnson") {
+    /*
+      Using the following Python code to load the data and train the model using
+      scikit-learn package.
+
+      from sklearn.preprocessing import PowerTransformer
+      import numpy as np
+
+      X = np.array([[1.28331718, 1.18092228, 0.84160269],
+                    [0.94293279, 1.60960836, 0.3879099],
+                    [1.35235668, 0.21715673, 1.09977091]], dtype=np.float)
+      pt = PowerTransformer(standardize=False)
+      ptm = pt.fit(X)
+
+      >>> ptm.lambdas_
+      array([9.00955644, 1.72211468, 2.16092368])
+      >>> ptm.transform(X)
+      array([[1.88609649e+02, 1.64321816e+00, 1.26875990e+00],
+             [4.39620682e+01, 2.44852195e+00, 4.76941835e-01],
+             [2.46706192e+02, 2.33862299e-01, 1.83629062e+00]])
+     */
+
+    val df = data.zip(resWithYeoJohnson).toSeq.toDF("features", "expected")
+    val pt = new PowerTransform()
+      .setInputCol("features")
+      .setOutputCol("transformed")
+      .setModelType("yeo-johnson")
+
+    val ptm = pt.fit(df)
+    assert(ptm.lambda ~== Vectors.dense(9.00955644, 1.72211468, 2.16092368) relTol 1e-5)
+
+    val transformed = ptm.transform(df)
+    checkVectorSizeOnDF(transformed, "transformed", ptm.numFeatures)
+
+    testTransformer[(Vector, Vector)](df, ptm, "transformed", "expected")(
+      assertResult)
+  }
+
+  test("Box-Cox") {
+    /*
+      Python code:
+
+      pt = PowerTransformer(method="box-cox", standardize=False)
+      ptm = pt.fit(X)
+
+      >>> ptm.lambdas_
+      array([4.92011835, 0.86296595, 1.15354434])
+      >>> ptm.transform(X)
+      array([[ 0.49024348,  0.17881995, -0.15637811],
+             [-0.05102892,  0.58863196, -0.57612414],
+             [ 0.69420008, -0.84857822,  0.10051454]])
+     */
+
+    val df = data.zip(resWithBoxCox).toSeq.toDF("features", "expected")
+    val pt = new PowerTransform()
+      .setInputCol("features")
+      .setOutputCol("transformed")
+      .setModelType("box-cox")
+
+    val ptm = pt.fit(df)
+    assert(ptm.lambda ~== Vectors.dense(4.92011835, 0.86296595, 1.15354434) relTol 1e-5)
+
+    val transformed = ptm.transform(df)
+    checkVectorSizeOnDF(transformed, "transformed", ptm.numFeatures)
+
+    testTransformer[(Vector, Vector)](df, ptm, "transformed", "expected")(
+      assertResult)
+  }
+
+  test("Yeo-Johnson with down-sampling") {
+    val df = dataWithNoise.map(Tuple1.apply).toSeq.toDF("features")
+    val pt = new PowerTransform()
+      .setInputCol("features")
+      .setOutputCol("transformed")
+      .setModelType("yeo-johnson")
+
+    val ptm = pt.fit(df)
+    assert(ptm.lambda ~== Vectors.dense(9.00955644, 1.72211468, 2.16092368) relTol 1e-5)
+
+    pt.setNumBins(100)
 
 Review comment:
   about 3000 distinct values, 100 bins, each bin contains about 30 values

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607175232
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607180710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120673/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607180696
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600470611
 
 
   differences from sklearn's implementation:
   1, may group distinct values;
   2, ignore standardation, which is enabled in sklearn by default; we can standardize the data by `StandardScaler`
   3, both use brent solver, but the implementation may differ; sklearn use `scipy.optimize.brent` with bound=[-2,2], iter=500, tol=1.48e-8, and `scipy.optimize.brent` do not guarantee solution between input bounds; while I use `org.apache.commons.math3.optim.univariate.BrentOptimizer` with bound=[-10,10], iter=1000; (I also tried sklearn's default parameters, the results are the same; I conservatively changed bound and iter)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607131552
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483178
 
 
   **[Test build #119977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119977/testReport)** for PR 27944 at commit [`de7cc9b`](https://github.com/apache/spark/commit/de7cc9bbdcee0d05eab0fb67ee2584146d529592).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r401523780
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
 ##########
 @@ -554,4 +554,46 @@ object MLUtils extends Logging {
       math.log1p(math.exp(x))
     }
   }
+
+  /**
+   * Sequentially group input elements to groups, and do aggregation within each group.
+   * A group only contains single key, and be of size no greater than the corresponding size.
+   * For example, input keys = [1, 1, 1, 2, 2, 2, 3, 3, 1],
+   * group sizes are: 1->2, 2->5, 3->1,
+   * then the groups are {1, 1}, {1}, {2, 2, 2}, {3}, {3}, {1}.
+   *
+   * @param input input iterator containing (key, value), usually sorted by key
+   * @param getSize group size of each key.
+   * @return aggregated iterator
+   */
+  private[spark] def combineWithinGroups[K, V, U](
 
 Review comment:
   This method should be helpful when implementing algorithms which needs down-sampling, it is similar to down-sampling in AUC (only one column) while it can be used in multi-column cases with variant group sizes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607102508
 
 
   **[Test build #120669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120669/testReport)** for PR 27944 at commit [`7bf9ec6`](https://github.com/apache/spark/commit/7bf9ec60b11132096b8056b836a0057a9f8d778f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607206286
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615236640
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121410/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600487160
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24701/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607206293
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120675/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600489494
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615236593
 
 
   **[Test build #121410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121410/testReport)** for PR 27944 at commit [`5c64d90`](https://github.com/apache/spark/commit/5c64d90c3dfbec11738296047474189f71cc8f76).
    * This patch **fails to generate documentation**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607131552
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600489039
 
 
   **[Test build #119972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119972/testReport)** for PR 27944 at commit [`932caff`](https://github.com/apache/spark/commit/932caff9a5e47f3f3f8a99d20599bba4f9554d71).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class PowerTransform @Since(\"3.1.0\")(@Since(\"3.1.0\") override val uid: String)`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607175240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120672/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607127268
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25369/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607130755
 
 
   **[Test build #120671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120671/testReport)** for PR 27944 at commit [`66a97c9`](https://github.com/apache/spark/commit/66a97c955d4c10b9a2dfb5a07ce35b266af8b357).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607180696
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142868
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607103063
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607137761
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r394149351
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/PowerTransform.scala
 ##########
 @@ -0,0 +1,561 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.commons.math3.analysis._
+import org.apache.commons.math3.optim._
+import org.apache.commons.math3.optim.nonlinear.scalar._
+import org.apache.commons.math3.optim.univariate._
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Params for [[PowerTransform]] and [[PowerTransformModel]].
+ */
+private[feature] trait PowerTransformParams extends Params with HasInputCol with HasOutputCol {
+
+  /**
+   * The model type which is a string (case-sensitive).
+   * Supported options: "yeo-johnson", "box-cox".
+   * (default = yeo-johnson)
+   *
+   * @group param
+   */
+  final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
+    "which is a string (case-sensitive). Supported options: yeo-johnson (default), and box-cox.",
+    ParamValidators.inArray[String](PowerTransform.supportedModelTypes))
+
+  /** @group getParam */
+  final def getModelType: String = $(modelType)
+
+  setDefault(modelType -> PowerTransform.YeoJohnson)
+
+  /**
+   * param for number of bins to down-sample the curves in statistics computation.
+   * If 0, no down-sampling will occur.
+   * Default: 100,000.
+   * @group expertParam
+   */
+  val numBins: IntParam = new IntParam(this, "numBins", "Number of bins to down-sample " +
+    "the curves in statistics computation. If 0, no down-sampling will occur. Must be >= 0.",
+    ParamValidators.gtEq(0))
+
+  /** @group expertGetParam */
+  def getNumBins: Int = $(numBins)
+
+  setDefault(numBins -> 100000)
+
+  /** Validates and transforms the input schema. */
+  protected def validateAndTransformSchema(schema: StructType): StructType = {
+    SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT)
+    require(!schema.fieldNames.contains($(outputCol)),
+      s"Output column ${$(outputCol)} already exists.")
+    SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT)
+  }
+}
+
+
+/**
+ * Apply a power transform to make data more Gaussian-like.
+ * Currently, PowerTransform supports the Box-Cox transform and the Yeo-Johnson transform.
+ * The optimal parameter for stabilizing variance and minimizing skewness is estimated through
+ * maximum likelihood.
+ * Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both
+ * positive or negative data.
+ */
+@Since("3.1.0")
+class PowerTransform @Since("3.1.0")(@Since("3.1.0") override val uid: String)
+  extends Estimator[PowerTransformModel] with PowerTransformParams with DefaultParamsWritable {
+
+  import PowerTransform._
+
+  def this() = this(Identifiable.randomUID("power_trans"))
+
+  /** @group setParam */
+  def setInputCol(value: String): this.type = set(inputCol, value)
+
+  /** @group setParam */
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  /** @group setParam */
+  def setModelType(value: String): this.type = set(modelType, value)
+
+  /** @group expertSetParam */
+  def setNumBins(value: Int): this.type = set(numBins, value)
+
+  override def fit(dataset: Dataset[_]): PowerTransformModel = {
+    transformSchema(dataset.schema, logging = true)
+
+    val spark = dataset.sparkSession
+    import spark.implicits._
+
+    val localModelType = $(modelType)
+    val numFeatures = MetadataUtils.getNumFeatures(dataset, $(inputCol))
+    val numRows = dataset.count()
+
+    val validateFunc = $(modelType) match {
+      case BoxCox => vec: Vector => requirePositiveValues(vec)
+      case YeoJohnson => vec: Vector => requireNonNaNValues(vec)
+    }
+
+    var pairCounts = dataset
+      .select($(inputCol))
+      .flatMap { case Row(vec: Vector) =>
+        require(vec.size == numFeatures)
+        validateFunc(vec)
+        vec.nonZeroIterator
+      }.toDF("col", "value")
+      .groupBy("col", "value")
+      .agg(count(lit(0)).as("cnt"))
+      .sort("col", "value")
+
+    val groups = if (0 < $(numBins) && $(numBins) <= numRows) {
+      val localNumBins = $(numBins)
+      pairCounts
+        .groupBy("col")
+        .count()
+        .as[(Int, Long)]
+        .flatMap { case (col, num) =>
+          val group = num / localNumBins
+          if (group >= 2) {
+            Some((col, group))
+          } else {
+            None
+          }
+        }.collect().toMap
+    } else Map.empty[Int, Long]
+
+    if (groups.nonEmpty) {
+      pairCounts = makeBins(pairCounts.as[(Int, Double, Long)], groups)
+        .toDF("col", "value", "cnt")
+    }
+
+    val solutions = pairCounts
+      .groupBy("col")
+      .agg(collect_list(struct("value", "cnt")))
+      .as[(Int, Seq[(Double, Long)])]
+      .map { case (col, seq) =>
+        val nnz = seq.iterator.map(_._2).sum
+        val nz = numRows - nnz
+        val (solution, _) = localModelType match {
+          case BoxCox =>
+            require(nz >= 0)
+            val computeIter = if (nz > 0) {
+              () => seq.iterator ++ Iterator.single((0.0, nz))
+            } else {
+              () => seq.iterator
+            }
+            solveBoxCox(computeIter)
+          case YeoJohnson =>
+            require(nz == 0)
+            val computeIter = () => seq.iterator
+            solveYeoJohnson(computeIter)
+        }
+        (col, solution)
+      }.collect().toMap
+
+    val lambda = Array.ofDim[Double](numFeatures)
+    solutions.foreach { case (col, solution) => lambda(col) = solution }
+
+    if (solutions.size < numFeatures) {
+      localModelType match {
+        case YeoJohnson =>
+          // if some column only contains 0 values in YeoJohnson
+          val computeIter = () => Iterator.single((0.0, numRows))
+          val (zeroSolution, _) = solveYeoJohnson(computeIter)
+          Iterator.range(0, numFeatures)
+            .filterNot(solutions.contains)
+            .foreach { col => lambda(col) = zeroSolution }
+
+        case BoxCox =>
+          // This should never happen.
+          throw new IllegalArgumentException("BoxCox requires positive values")
+      }
+    }
+
+   copyValues(new PowerTransformModel(uid, Vectors.dense(lambda).compressed)
+    .setParent(this))
+  }
+
+  override def copy(extra: ParamMap): PowerTransform = defaultCopy(extra)
+
+  override def transformSchema(schema: StructType): StructType = {
+    validateAndTransformSchema(schema)
+  }
+}
+
+
+@Since("3.1.0")
+object PowerTransform extends DefaultParamsReadable[PowerTransform] {
+
+  override def load(path: String): PowerTransform = super.load(path)
+
+  /** String name for Box-Cox transform model type. */
+  private[feature] val BoxCox: String = "box-cox"
+
+  /** String name for Yeo-Johnson transform model type. */
+  private[feature] val YeoJohnson: String = "yeo-johnson"
+
+  /* Set of modelTypes that PowerTransform supports */
+  private[feature] val supportedModelTypes = Array(BoxCox, YeoJohnson)
+
+  private[feature] def brentSolve(obj: UnivariateFunction): (Double, Double) = {
 
 Review comment:
   vs scikit-learn's [implementation](https://github.com/scikit-learn/scikit-learn/blob/b189bf60708af22dde82a00aca7b5a54290b666d/sklearn/preprocessing/_data.py#L3042):
   use same `tol` = 1.48E-8;
   sklearn uses bound [-2, 2], but it said "Providing the pair (xa,xb) does not always mean
           the obtained solution will satisfy xa<=x<=xb.";
   sklearn use iters=500;
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607138554
 
 
   **[Test build #120672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120672/testReport)** for PR 27944 at commit [`c574fcb`](https://github.com/apache/spark/commit/c574fcbe64246aca2aaf7790222a17f1e85b7db4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173854
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25374/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483178
 
 
   **[Test build #119977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119977/testReport)** for PR 27944 at commit [`de7cc9b`](https://github.com/apache/spark/commit/de7cc9bbdcee0d05eab0fb67ee2584146d529592).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600489500
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119972/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607180710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120673/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng edited a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng edited a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600470611
 
 
   differences from sklearn's implementation:
   1, may group distinct values;
   2, ignore standardation, which is enabled in sklearn by default; we can standardize the data by `StandardScaler`
   3, both use brent solver, but the implementations differ:
      
   - sklearn use `scipy.optimize.brent` with bound=[-2,2], iter=500, tol=1.48e-8, and `scipy.optimize.brent` do not guarantee solution between input bounds; 
   
   - while I use `org.apache.commons.math3.optim.univariate.BrentOptimizer` with bound=[-10,10], iter=1000; it also require a positive `BrentRel` for relative tol  (I also tried sklearn's default parameters, the results are the same; I conservatively changed bound and iter)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600518839
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r394158744
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/PowerTransform.scala
 ##########
 @@ -0,0 +1,561 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.commons.math3.analysis._
+import org.apache.commons.math3.optim._
+import org.apache.commons.math3.optim.nonlinear.scalar._
+import org.apache.commons.math3.optim.univariate._
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Params for [[PowerTransform]] and [[PowerTransformModel]].
+ */
+private[feature] trait PowerTransformParams extends Params with HasInputCol with HasOutputCol {
+
+  /**
+   * The model type which is a string (case-sensitive).
+   * Supported options: "yeo-johnson", "box-cox".
+   * (default = yeo-johnson)
+   *
+   * @group param
+   */
+  final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
+    "which is a string (case-sensitive). Supported options: yeo-johnson (default), and box-cox.",
+    ParamValidators.inArray[String](PowerTransform.supportedModelTypes))
+
+  /** @group getParam */
+  final def getModelType: String = $(modelType)
+
+  setDefault(modelType -> PowerTransform.YeoJohnson)
+
+  /**
+   * param for number of bins to down-sample the curves in statistics computation.
+   * If 0, no down-sampling will occur.
+   * Default: 100,000.
+   * @group expertParam
+   */
+  val numBins: IntParam = new IntParam(this, "numBins", "Number of bins to down-sample " +
+    "the curves in statistics computation. If 0, no down-sampling will occur. Must be >= 0.",
+    ParamValidators.gtEq(0))
+
+  /** @group expertGetParam */
+  def getNumBins: Int = $(numBins)
+
+  setDefault(numBins -> 100000)
+
+  /** Validates and transforms the input schema. */
+  protected def validateAndTransformSchema(schema: StructType): StructType = {
+    SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT)
+    require(!schema.fieldNames.contains($(outputCol)),
+      s"Output column ${$(outputCol)} already exists.")
+    SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT)
+  }
+}
+
+
+/**
+ * Apply a power transform to make data more Gaussian-like.
+ * Currently, PowerTransform supports the Box-Cox transform and the Yeo-Johnson transform.
+ * The optimal parameter for stabilizing variance and minimizing skewness is estimated through
+ * maximum likelihood.
+ * Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both
+ * positive or negative data.
+ */
+@Since("3.1.0")
+class PowerTransform @Since("3.1.0")(@Since("3.1.0") override val uid: String)
+  extends Estimator[PowerTransformModel] with PowerTransformParams with DefaultParamsWritable {
+
+  import PowerTransform._
+
+  def this() = this(Identifiable.randomUID("power_trans"))
+
+  /** @group setParam */
+  def setInputCol(value: String): this.type = set(inputCol, value)
+
+  /** @group setParam */
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  /** @group setParam */
+  def setModelType(value: String): this.type = set(modelType, value)
+
+  /** @group expertSetParam */
+  def setNumBins(value: Int): this.type = set(numBins, value)
+
+  override def fit(dataset: Dataset[_]): PowerTransformModel = {
+    transformSchema(dataset.schema, logging = true)
+
+    val spark = dataset.sparkSession
+    import spark.implicits._
+
+    val localModelType = $(modelType)
+    val numFeatures = MetadataUtils.getNumFeatures(dataset, $(inputCol))
+    val numRows = dataset.count()
+
+    val validateFunc = $(modelType) match {
+      case BoxCox => vec: Vector => requirePositiveValues(vec)
+      case YeoJohnson => vec: Vector => requireNonNaNValues(vec)
+    }
+
+    var pairCounts = dataset
+      .select($(inputCol))
+      .flatMap { case Row(vec: Vector) =>
+        require(vec.size == numFeatures)
+        validateFunc(vec)
+        vec.nonZeroIterator
+      }.toDF("col", "value")
+      .groupBy("col", "value")
+      .agg(count(lit(0)).as("cnt"))
+      .sort("col", "value")
+
+    val groups = if (0 < $(numBins) && $(numBins) <= numRows) {
+      val localNumBins = $(numBins)
+      pairCounts
+        .groupBy("col")
+        .count()
+        .as[(Int, Long)]
+        .flatMap { case (col, num) =>
+          val group = num / localNumBins
+          if (group >= 2) {
+            Some((col, group))
+          } else {
+            None
+          }
+        }.collect().toMap
+    } else Map.empty[Int, Long]
+
+    if (groups.nonEmpty) {
+      pairCounts = makeBins(pairCounts.as[(Int, Double, Long)], groups)
+        .toDF("col", "value", "cnt")
+    }
+
+    val solutions = pairCounts
+      .groupBy("col")
+      .agg(collect_list(struct("value", "cnt")))
+      .as[(Int, Seq[(Double, Long)])]
+      .map { case (col, seq) =>
+        val nnz = seq.iterator.map(_._2).sum
+        val nz = numRows - nnz
+        val (solution, _) = localModelType match {
+          case BoxCox =>
+            require(nz >= 0)
+            val computeIter = if (nz > 0) {
+              () => seq.iterator ++ Iterator.single((0.0, nz))
+            } else {
+              () => seq.iterator
+            }
+            solveBoxCox(computeIter)
+          case YeoJohnson =>
+            require(nz == 0)
+            val computeIter = () => seq.iterator
+            solveYeoJohnson(computeIter)
+        }
+        (col, solution)
+      }.collect().toMap
+
+    val lambda = Array.ofDim[Double](numFeatures)
+    solutions.foreach { case (col, solution) => lambda(col) = solution }
+
+    if (solutions.size < numFeatures) {
+      localModelType match {
+        case YeoJohnson =>
+          // if some column only contains 0 values in YeoJohnson
+          val computeIter = () => Iterator.single((0.0, numRows))
+          val (zeroSolution, _) = solveYeoJohnson(computeIter)
+          Iterator.range(0, numFeatures)
+            .filterNot(solutions.contains)
+            .foreach { col => lambda(col) = zeroSolution }
+
+        case BoxCox =>
+          // This should never happen.
+          throw new IllegalArgumentException("BoxCox requires positive values")
+      }
+    }
+
+   copyValues(new PowerTransformModel(uid, Vectors.dense(lambda).compressed)
+    .setParent(this))
+  }
+
+  override def copy(extra: ParamMap): PowerTransform = defaultCopy(extra)
+
+  override def transformSchema(schema: StructType): StructType = {
+    validateAndTransformSchema(schema)
+  }
+}
+
+
+@Since("3.1.0")
+object PowerTransform extends DefaultParamsReadable[PowerTransform] {
+
+  override def load(path: String): PowerTransform = super.load(path)
+
+  /** String name for Box-Cox transform model type. */
+  private[feature] val BoxCox: String = "box-cox"
+
+  /** String name for Yeo-Johnson transform model type. */
+  private[feature] val YeoJohnson: String = "yeo-johnson"
+
+  /* Set of modelTypes that PowerTransform supports */
+  private[feature] val supportedModelTypes = Array(BoxCox, YeoJohnson)
+
+  private[feature] def brentSolve(obj: UnivariateFunction): (Double, Double) = {
+    val BrentLowerBound = -10.0
+    val BrentUpperBound = 10.0
+    val BrentRel = 1E-8
 
 Review comment:
   `org.apache.commons.math3.optim.univariate.BrentOptimizer` require a positive `BrentRel` for relative tol, which do not exists in `scipy.optimize.brent`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600489494
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600465502
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173842
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615229677
 
 
   **[Test build #121410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121410/testReport)** for PR 27944 at commit [`5c64d90`](https://github.com/apache/spark/commit/5c64d90c3dfbec11738296047474189f71cc8f76).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600517432
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119977/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607206286
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng edited a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng edited a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600470611
 
 
   differences from sklearn's implementation:
   1, may group distinct values;
   2, ignore standardation, which is enabled in sklearn by default; we can standardize the data by `StandardScaler`
   3, both use brent solver, but the implementations differ:
      
   - sklearn use `scipy.optimize.brent` with bound=[-2,2], iter=500, tol=1.48e-8, and `scipy.optimize.brent` do not guarantee solution between input bounds; 
   
   - while I use `org.apache.commons.math3.optim.univariate.BrentOptimizer` with bound=[-10,10], iter=1000; (I also tried sklearn's default parameters, the results are the same; I conservatively changed bound and iter)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607175232
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607139233
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615230193
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-615230187
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600517425
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607139247
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25371/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607163493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120671/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600518296
 
 
   **[Test build #119978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119978/testReport)** for PR 27944 at commit [`086c143`](https://github.com/apache/spark/commit/086c1430c5078358124b5d60fbd4e6f9f4b6f854).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607138554
 
 
   **[Test build #120672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120672/testReport)** for PR 27944 at commit [`c574fcb`](https://github.com/apache/spark/commit/c574fcbe64246aca2aaf7790222a17f1e85b7db4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607103063
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607205844
 
 
   **[Test build #120675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120675/testReport)** for PR 27944 at commit [`b19c197`](https://github.com/apache/spark/commit/b19c19706aad1db60e30a2e5d18021eecc382175).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class PowerTransform @Since(\"3.1.0\")(`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607139247
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25371/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607130755
 
 
   **[Test build #120671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120671/testReport)** for PR 27944 at commit [`66a97c9`](https://github.com/apache/spark/commit/66a97c955d4c10b9a2dfb5a07ce35b266af8b357).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607163484
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607173509
 
 
   **[Test build #120675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120675/testReport)** for PR 27944 at commit [`b19c197`](https://github.com/apache/spark/commit/b19c19706aad1db60e30a2e5d18021eecc382175).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142878
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25372/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607175240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120672/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600486528
 
 
   **[Test build #119978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119978/testReport)** for PR 27944 at commit [`086c143`](https://github.com/apache/spark/commit/086c1430c5078358124b5d60fbd4e6f9f4b6f854).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607163493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120671/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600483710
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24700/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142868
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600464979
 
 
   **[Test build #119972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119972/testReport)** for PR 27944 at commit [`932caff`](https://github.com/apache/spark/commit/932caff9a5e47f3f3f8a99d20599bba4f9554d71).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607103073
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25368/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607137768
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120669/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607163484
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607180203
 
 
   **[Test build #120673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120673/testReport)** for PR 27944 at commit [`c656d0f`](https://github.com/apache/spark/commit/c656d0f1d91e7153e555cfc5391e010ddcd620f9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607127258
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600517432
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119977/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600486528
 
 
   **[Test build #119978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119978/testReport)** for PR 27944 at commit [`086c143`](https://github.com/apache/spark/commit/086c1430c5078358124b5d60fbd4e6f9f4b6f854).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600487155
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600518839
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-600465502
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r394149351
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/PowerTransform.scala
 ##########
 @@ -0,0 +1,561 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.commons.math3.analysis._
+import org.apache.commons.math3.optim._
+import org.apache.commons.math3.optim.nonlinear.scalar._
+import org.apache.commons.math3.optim.univariate._
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Params for [[PowerTransform]] and [[PowerTransformModel]].
+ */
+private[feature] trait PowerTransformParams extends Params with HasInputCol with HasOutputCol {
+
+  /**
+   * The model type which is a string (case-sensitive).
+   * Supported options: "yeo-johnson", "box-cox".
+   * (default = yeo-johnson)
+   *
+   * @group param
+   */
+  final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
+    "which is a string (case-sensitive). Supported options: yeo-johnson (default), and box-cox.",
+    ParamValidators.inArray[String](PowerTransform.supportedModelTypes))
+
+  /** @group getParam */
+  final def getModelType: String = $(modelType)
+
+  setDefault(modelType -> PowerTransform.YeoJohnson)
+
+  /**
+   * param for number of bins to down-sample the curves in statistics computation.
+   * If 0, no down-sampling will occur.
+   * Default: 100,000.
+   * @group expertParam
+   */
+  val numBins: IntParam = new IntParam(this, "numBins", "Number of bins to down-sample " +
+    "the curves in statistics computation. If 0, no down-sampling will occur. Must be >= 0.",
+    ParamValidators.gtEq(0))
+
+  /** @group expertGetParam */
+  def getNumBins: Int = $(numBins)
+
+  setDefault(numBins -> 100000)
+
+  /** Validates and transforms the input schema. */
+  protected def validateAndTransformSchema(schema: StructType): StructType = {
+    SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT)
+    require(!schema.fieldNames.contains($(outputCol)),
+      s"Output column ${$(outputCol)} already exists.")
+    SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT)
+  }
+}
+
+
+/**
+ * Apply a power transform to make data more Gaussian-like.
+ * Currently, PowerTransform supports the Box-Cox transform and the Yeo-Johnson transform.
+ * The optimal parameter for stabilizing variance and minimizing skewness is estimated through
+ * maximum likelihood.
+ * Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both
+ * positive or negative data.
+ */
+@Since("3.1.0")
+class PowerTransform @Since("3.1.0")(@Since("3.1.0") override val uid: String)
+  extends Estimator[PowerTransformModel] with PowerTransformParams with DefaultParamsWritable {
+
+  import PowerTransform._
+
+  def this() = this(Identifiable.randomUID("power_trans"))
+
+  /** @group setParam */
+  def setInputCol(value: String): this.type = set(inputCol, value)
+
+  /** @group setParam */
+  def setOutputCol(value: String): this.type = set(outputCol, value)
+
+  /** @group setParam */
+  def setModelType(value: String): this.type = set(modelType, value)
+
+  /** @group expertSetParam */
+  def setNumBins(value: Int): this.type = set(numBins, value)
+
+  override def fit(dataset: Dataset[_]): PowerTransformModel = {
+    transformSchema(dataset.schema, logging = true)
+
+    val spark = dataset.sparkSession
+    import spark.implicits._
+
+    val localModelType = $(modelType)
+    val numFeatures = MetadataUtils.getNumFeatures(dataset, $(inputCol))
+    val numRows = dataset.count()
+
+    val validateFunc = $(modelType) match {
+      case BoxCox => vec: Vector => requirePositiveValues(vec)
+      case YeoJohnson => vec: Vector => requireNonNaNValues(vec)
+    }
+
+    var pairCounts = dataset
+      .select($(inputCol))
+      .flatMap { case Row(vec: Vector) =>
+        require(vec.size == numFeatures)
+        validateFunc(vec)
+        vec.nonZeroIterator
+      }.toDF("col", "value")
+      .groupBy("col", "value")
+      .agg(count(lit(0)).as("cnt"))
+      .sort("col", "value")
+
+    val groups = if (0 < $(numBins) && $(numBins) <= numRows) {
+      val localNumBins = $(numBins)
+      pairCounts
+        .groupBy("col")
+        .count()
+        .as[(Int, Long)]
+        .flatMap { case (col, num) =>
+          val group = num / localNumBins
+          if (group >= 2) {
+            Some((col, group))
+          } else {
+            None
+          }
+        }.collect().toMap
+    } else Map.empty[Int, Long]
+
+    if (groups.nonEmpty) {
+      pairCounts = makeBins(pairCounts.as[(Int, Double, Long)], groups)
+        .toDF("col", "value", "cnt")
+    }
+
+    val solutions = pairCounts
+      .groupBy("col")
+      .agg(collect_list(struct("value", "cnt")))
+      .as[(Int, Seq[(Double, Long)])]
+      .map { case (col, seq) =>
+        val nnz = seq.iterator.map(_._2).sum
+        val nz = numRows - nnz
+        val (solution, _) = localModelType match {
+          case BoxCox =>
+            require(nz >= 0)
+            val computeIter = if (nz > 0) {
+              () => seq.iterator ++ Iterator.single((0.0, nz))
+            } else {
+              () => seq.iterator
+            }
+            solveBoxCox(computeIter)
+          case YeoJohnson =>
+            require(nz == 0)
+            val computeIter = () => seq.iterator
+            solveYeoJohnson(computeIter)
+        }
+        (col, solution)
+      }.collect().toMap
+
+    val lambda = Array.ofDim[Double](numFeatures)
+    solutions.foreach { case (col, solution) => lambda(col) = solution }
+
+    if (solutions.size < numFeatures) {
+      localModelType match {
+        case YeoJohnson =>
+          // if some column only contains 0 values in YeoJohnson
+          val computeIter = () => Iterator.single((0.0, numRows))
+          val (zeroSolution, _) = solveYeoJohnson(computeIter)
+          Iterator.range(0, numFeatures)
+            .filterNot(solutions.contains)
+            .foreach { col => lambda(col) = zeroSolution }
+
+        case BoxCox =>
+          // This should never happen.
+          throw new IllegalArgumentException("BoxCox requires positive values")
+      }
+    }
+
+   copyValues(new PowerTransformModel(uid, Vectors.dense(lambda).compressed)
+    .setParent(this))
+  }
+
+  override def copy(extra: ParamMap): PowerTransform = defaultCopy(extra)
+
+  override def transformSchema(schema: StructType): StructType = {
+    validateAndTransformSchema(schema)
+  }
+}
+
+
+@Since("3.1.0")
+object PowerTransform extends DefaultParamsReadable[PowerTransform] {
+
+  override def load(path: String): PowerTransform = super.load(path)
+
+  /** String name for Box-Cox transform model type. */
+  private[feature] val BoxCox: String = "box-cox"
+
+  /** String name for Yeo-Johnson transform model type. */
+  private[feature] val YeoJohnson: String = "yeo-johnson"
+
+  /* Set of modelTypes that PowerTransform supports */
+  private[feature] val supportedModelTypes = Array(BoxCox, YeoJohnson)
+
+  private[feature] def brentSolve(obj: UnivariateFunction): (Double, Double) = {
 
 Review comment:
   vs scikit-learn's [implementation](https://github.com/scikit-learn/scikit-learn/blob/b189bf60708af22dde82a00aca7b5a54290b666d/sklearn/preprocessing/_data.py#L3042):
   use same `BrentAbs = 1.48E-8`;
   sklearn uses bound [-2, 2], but it said "Providing the pair (xa,xb) does not always mean
           the obtained solution will satisfy xa<=x<=xb.";
   sklearn use iters=500;
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#issuecomment-607142387
 
 
   **[Test build #120673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120673/testReport)** for PR 27944 at commit [`c656d0f`](https://github.com/apache/spark/commit/c656d0f1d91e7153e555cfc5391e010ddcd620f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on a change in pull request #27944: [SPARK-31180][ML] Implement PowerTransform
URL: https://github.com/apache/spark/pull/27944#discussion_r394145939
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/ml/feature/PowerTransform.scala
 ##########
 @@ -0,0 +1,561 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.commons.math3.analysis._
+import org.apache.commons.math3.optim._
+import org.apache.commons.math3.optim.nonlinear.scalar._
+import org.apache.commons.math3.optim.univariate._
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Params for [[PowerTransform]] and [[PowerTransformModel]].
+ */
+private[feature] trait PowerTransformParams extends Params with HasInputCol with HasOutputCol {
+
+  /**
+   * The model type which is a string (case-sensitive).
+   * Supported options: "yeo-johnson", "box-cox".
+   * (default = yeo-johnson)
+   *
+   * @group param
+   */
+  final val modelType: Param[String] = new Param[String](this, "modelType", "The model type " +
+    "which is a string (case-sensitive). Supported options: yeo-johnson (default), and box-cox.",
+    ParamValidators.inArray[String](PowerTransform.supportedModelTypes))
+
+  /** @group getParam */
+  final def getModelType: String = $(modelType)
+
+  setDefault(modelType -> PowerTransform.YeoJohnson)
+
+  /**
+   * param for number of bins to down-sample the curves in statistics computation.
 
 Review comment:
   Here follows the computation of AUC, if there are too many distinct values in some column, we group them into bins before computation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org