You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/02/10 09:16:24 UTC

[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/20566

    [SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug

    ## What changes were proposed in this pull request?
    
    Since 2.3, `Bucketizer` supports multiple input/output columns. We will check if exclusive params are set during transformation. E.g., if `inputCols` and `outputCol` are both set, an error will be thrown.
    
    However, when we write `Bucketizer`, looks like the default params and user-supplied params are merged during writing. All saved params are loaded back and set to created model instance. So the default `outputCol` param in `HasOutputCol` trait will be set in `paramMap` and become an user-supplied param. That makes the check of exclusive params failed.
    
    This patch changes `DefaultParamsWriter` and only save user-supplied params.
    
    ## How was this patch tested?
    
    Modified test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-23377

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20566
    
----
commit 7785cacee8dd4a6e9938c3c99dad3ad3117655d3
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-02-10T08:52:17Z

    Only save user-supplied params.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    I believe this will break persistence for LogisticRegression. I believe the issue is that the `threshold` param on LogisticRegressionModel doesn't get a default directly, but only gets it during the call to `fit` on LogisticRegression. This is currently fine because the Model can only be created by fitting or by being read from disk and in both case some value gets set for threshold. With this change that's no longer the case. Here's a test to confirm, https://github.com/apache/spark/commit/5db2108224accdf848b41ef0d8d1c312b49f49c6.
    
    I believe LinearRegression may have a similar issue.
    
    Our current tests don't seem to cover this kind of thing so I think we should improve test coverage if we want to make this kind of change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit [`3b5e7c6`](https://github.com/apache/spark/commit/3b5e7c64742f7eeaf2fe9d3cb95bbbcef1f15abc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/20566


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/771/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/785/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20566#discussion_r167394494
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {
         extractParamMap(ParamMap.empty)
       }
     
    +  /**
    +   * Extracts the user-supplied params.
    +   */
    +  final def extractUserParamMap(): ParamMap = paramMap
    --- End diff --
    
    In this way I think you can also avoid the MiMa failure...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20566#discussion_r167398469
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {
         extractParamMap(ParamMap.empty)
       }
     
    +  /**
    +   * Extracts the user-supplied params.
    +   */
    +  final def extractUserParamMap(): ParamMap = paramMap
    --- End diff --
    
    Looks like it still can't avoid the MiMa failure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87289/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit [`7785cac`](https://github.com/apache/spark/commit/7785cacee8dd4a6e9938c3c99dad3ad3117655d3).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/769/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    @jkbradley Thanks! I will post the problem and proposed design on the JIRA.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Yeah, IMHO, when the user loads a model from old version into new version to run, I think it is reasonable to run it with current default value because the param is not explicitly set and should use "default" value of current system.
    
    Thanks for your comment. Let's wait for others' option.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/788/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit [`6228006`](https://github.com/apache/spark/commit/6228006fdb62ca25ffda21dab3e88cfe406a9e0b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87301/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Thanks for the patch @viirya 
    As always, I'll request that we put design decisions & long discussions in JIRA so that they are easier to uncover.  It can also be good to get quick feedback about design before implementation.  I'll comment in JIRA.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    cc @MLnick @jkbradley 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87299/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/787/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    @mgaido91 I also considered the issue of changed default values across versions. I'm not sure which is more reasonable, using old version's default value or using current version's default value.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87302/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    @viirya that's a good question. Honestly my idea is that if the user doesn't set a value, he/she doesn't care about it, so it is good to use the new version default IMHO. But it is also true that changing a default may cause unexpected behavior in user code.
    
    So, it LGTM, but I'd like to hear others' opinion on this too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20566#discussion_r167394216
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {
         extractParamMap(ParamMap.empty)
       }
     
    +  /**
    +   * Extracts the user-supplied params.
    +   */
    +  final def extractUserParamMap(): ParamMap = paramMap
    --- End diff --
    
    can't we just make `paramMap` `private[ml]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87299/testReport)** for PR 20566 at commit [`daceafe`](https://github.com/apache/spark/commit/daceafee5e6eed11cbfed91d1f72e0477ff0ec68).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87301/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    I'd close this and favor the quick fix #20594 based on the discussion in JIRA. Will re-open it if it is needed later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87289/testReport)** for PR 20566 at commit [`6228006`](https://github.com/apache/spark/commit/6228006fdb62ca25ffda21dab3e88cfe406a9e0b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87285/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87283/testReport)** for PR 20566 at commit [`7785cac`](https://github.com/apache/spark/commit/7785cacee8dd4a6e9938c3c99dad3ad3117655d3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87285/testReport)** for PR 20566 at commit [`3b5e7c6`](https://github.com/apache/spark/commit/3b5e7c64742f7eeaf2fe9d3cb95bbbcef1f15abc).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87283/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Not only `threshold`, the default params of `NaiveBayes`, `LogisticRegression` (maybe more, I'm looking up now) are all set in the estimator, not in their model. The models are received the default values at the end of `fit`.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    **[Test build #87302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87302/testReport)** for PR 20566 at commit [`c1fb657`](https://github.com/apache/spark/commit/c1fb6577d950b5c17c47d40b6baf0b86fc45a71a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/775/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20566#discussion_r167394422
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -864,6 +864,11 @@ trait Params extends Identifiable with Serializable {
         extractParamMap(ParamMap.empty)
       }
     
    +  /**
    +   * Extracts the user-supplied params.
    +   */
    +  final def extractUserParamMap(): ParamMap = paramMap
    --- End diff --
    
    Either way are good for me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org