You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2015/04/30 22:49:39 UTC

[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/5820

    [SPARK-5956][MLLIB] Pipeline components should be copyable.

    This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a copy of the current instance with a randomly generated uid and some extra param values. With this change, we can only implement `fit` and `transform` without extra param values given the default implementation of `fit(dataset, extra)`:
    
    ~~~scala
    def fit(dataset: DataFrame, extra: ParamMap): Model = {
      copy(extra).fit(dataset)
    }
    ~~~
    
    Inside `fit` and `transform`, since only the embedded values are used, I added `$` as an alias for `getOrDefault` to make the code easier to read. For example, in `LinearRegression.fit` we have:
    
    ~~~scala
    val effectiveRegParam = $(regParam) / yStd
    val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam
    val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam
    ~~~
    
    Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the fitted pipeline model stored all copied stages (no matter whether it is a transformer or a model).
    
    Other changes:
    * `Params$.inheritValues` is moved to `Params!.copyValues` and returns the target instance.
    * `fittingParamMap` was removed because the `parent` carries this information.
    * `validate` was renamed to `validateParams` to be more precise.
    
    TODOs:
    * [ ] add tests for newly added methods
    * [ ] update documentation
    
    @jkbradley

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-5956

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5820
    
----
commit f082a310794dcbecf17ec3d44ab770beed84f956
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T08:30:57Z

    make Params copyable and simply handling of extra params in all spark.ml components

commit d882afc0956c59da78b5e0172f86e9c1f5432028
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T08:48:04Z

    test compile

commit 9ee004e7c534db7a6ab0434eec97feca86c7daad
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T16:52:58Z

    merge copy and copyWith; rename validate to validateParams

commit 53e097373e9c9707bb7486adc3c20fee384356c1
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T17:34:50Z

    move inheritValues to Params and rename it to copyValues

commit 9286a228c8e141dbcd66eaf21e9cfb29bc1d344a
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T17:40:02Z

    copyValues to trained models

commit 0f4fd64740763792087405f8921d4cd2bf00217a
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T18:07:38Z

    fix some tests

commit c76b4d120da94350efcb4638c8a24e4fbeabb177
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T20:02:24Z

    fix all unit tests

commit 5a67779b20aeb0db7c824a96f5b1cc2e7718f7c5
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T20:16:32Z

    examples compile

commit b642872be33a250ef1cefbbb51c3bd8a3d37b7ba
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-04-30T20:35:31Z

    example code runs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97963486
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97963469
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97983230
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31451/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97983227
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97983217
  
      [Test build #31451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31451/consoleFull) for   PR 5820 at commit [`b642872`](https://github.com/apache/spark/commit/b642872be33a250ef1cefbbb51c3bd8a3d37b7ba).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class Evaluator extends Params `
      * `abstract class PipelineStage extends Params with Logging `
      * `class BinaryClassificationEvaluator extends Evaluator with HasRawPredictionCol with HasLabelCol `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5956][MLLIB] Pipeline components should...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5820#issuecomment-97963766
  
      [Test build #31451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31451/consoleFull) for   PR 5820 at commit [`b642872`](https://github.com/apache/spark/commit/b642872be33a250ef1cefbbb51c3bd8a3d37b7ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org