You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2017/03/03 14:01:31 UTC

[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/17151

    [ML][Minor] Separate estimator and model params for read/write test.

    ## What changes were proposed in this pull request?
    Since we allow ```Estimator``` and ```Model``` not always share same params (see ```ALSParams``` and ```ALSModelParams```), we should pass in test params for estimator and model separately for ```testEstimatorAndModelReadWrite```.
     
    ## How was this patch tested?
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark test-rw

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17151
    
----
commit efff28b802c6a08058b49745d0e73783302c817f
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-03-03T13:55:34Z

    Separate estimator and model params for read/write test.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    **[Test build #73847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73847/testReport)** for PR 17151 at commit [`8f4f87e`](https://github.com/apache/spark/commit/8f4f87e93ae77086fa2be82e84c0f8275a549352).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17151#discussion_r104594062
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
         // Categorical splits with tree depth 2
         val categoricalData: DataFrame =
           TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
    -    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
    +    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
    --- End diff --
    
    Good points. I still think it's better to just add the extra constructor, but I don't feel strongly about it. So we can proceed with whatever you feel is best. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17151


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17151#discussion_r104825308
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
         // Categorical splits with tree depth 2
         val categoricalData: DataFrame =
           TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
    -    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
    +    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
    --- End diff --
    
    I prefer to let any refactoring of these tests happen as-needed. If there are specific cases that need to be done now, we should create JIRAs to track them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    **[Test build #73847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73847/testReport)** for PR 17151 at commit [`8f4f87e`](https://github.com/apache/spark/commit/8f4f87e93ae77086fa2be82e84c0f8275a549352).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17151#discussion_r104274732
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
         // Categorical splits with tree depth 2
         val categoricalData: DataFrame =
           TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
    -    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
    +    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
    --- End diff --
    
    @sethah Thanks for your kindly remind. I was planing to write as your suggestion before I'm thinking:
    * Actually all ```Model``` only need to extends a fraction of ```Params``` from ```Estimator```, so all ```testEstimatorAndModelReadWrite(estimator, dataset, testParams, checkModelData)``` should be changed to ```testEstimatorAndModelReadWrite(estimator, dataset, testEstimatorParams, testModelParams, checkModelData)``` eventually. I explicitly write with the later way in test suites to remind developers should separate their estimator and model params when adding new algorithms' read/write test. I'm afraid that developers are not aware of the separation if they refer other test suites and find almost all test cases only pass in ```testParams```.
    *  Though in the currently change, we pass in ```allParamSettings``` to both ```testEstimatorParams``` and ```testModelParams```, this is because they share the same params set. Others like ```ALS``` will be pass in separate params. I think we should push forward to refactor ```***``` and ```***Model``` to separate their params, which could make models more succinct.
    * If this is a public API, I totally agree with you. However, this is an internal auxiliary function, I think all test cases will need to pass in separate params eventually, so I settle a matter at one go.
    
    This is my two cents, I'm still open to hear your thoughts. If you have strongly opinion, I can update according your suggestion. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73847/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    **[Test build #73846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73846/testReport)** for PR 17151 at commit [`efff28b`](https://github.com/apache/spark/commit/efff28b802c6a08058b49745d0e73783302c817f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73846/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17151#discussion_r104883184
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
         // Categorical splits with tree depth 2
         val categoricalData: DataFrame =
           TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
    -    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
    +    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
    --- End diff --
    
    Yeah, actually almost all models are extends lots of params which are not necessary, I'd like to remove these params for models as todo list. I'll create JIRAs to track them. I'll merge this first since it blocks #17117. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17151: [ML][Minor] Separate estimator and model params f...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17151#discussion_r104187472
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -372,16 +372,18 @@ class DecisionTreeClassifierSuite
         // Categorical splits with tree depth 2
         val categoricalData: DataFrame =
           TreeTests.setMetadata(rdd, Map(0 -> 2, 1 -> 3), numClasses = 2)
    -    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings, checkModelData)
    +    testEstimatorAndModelReadWrite(dt, categoricalData, allParamSettings,
    --- End diff --
    
    It would seriously reduce the amount of code changed to just add an extra constructor:
    
    ````scala
    def testEstimatorAndModelReadWrite[
        E <: Estimator[M] with MLWritable, M <: Model[M] with MLWritable](
          estimator: E,
          dataset: Dataset[_],
          testParams: Map[String, Any],
          checkModelData: (M, M) => Unit): Unit = {
        testEstimatorAndModelReadWrite(estimator, dataset, testParams, testParams, checkModelData)
    }
    ````


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17151: [ML][Minor] Separate estimator and model params for read...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17151
  
    **[Test build #73846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73846/testReport)** for PR 17151 at commit [`efff28b`](https://github.com/apache/spark/commit/efff28b802c6a08058b49745d0e73783302c817f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org