You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by facaiy <gi...@git.apache.org> on 2017/05/26 08:54:38 UTC

[GitHub] spark pull request #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for...

GitHub user facaiy opened a pull request:

    https://github.com/apache/spark/pull/18120

    [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemble tree model in PySpark

    ## What changes were proposed in this pull request?
    
    add `getMaxDepth` method for ensemble tree models:
    + `RandomForestClassifierModel`
    + `RandomForestRegressionModel`
    + `GBTClassificationModel`
    + `GBTRegressionModel`
    
    
    ## How was this patch tested?
    
    + [ ] pass all unit tests.
    + [ ] add new doctest.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/facaiy/spark ENH/pyspark_rf_add_get_max_depth

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18120
    
----
commit 96e797f79000964ea226efced7318f24a1722535
Author: Yan Facai (颜发才) <fa...@gmail.com>
Date:   2017-05-26T07:48:02Z

    ENH: add maxDepth for resemble tree model

commit 50815928f070f89ae71e235d51e833cef596a361
Author: Yan Facai (颜发才) <fa...@gmail.com>
Date:   2017-05-26T08:45:11Z

    TST: add doctest

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    cc @BryanCutler.
    
    Bryan did some work on https://github.com/apache/spark/pull/17849. It seems even with that patch, we still need to add methods like these, hoping Bryan can confirm. If we're going to add some param accessors to the models, best to do them all at once yes? E.g. we can also add `getMaxBins` and others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by facaiy <gi...@git.apache.org>.
Github user facaiy commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    @keypointt Hi, could you help check the pr is consistent with your #17207 ? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by keypointt <gi...@git.apache.org>.
Github user keypointt commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Hi Facai, you exposed api `getMaxDepth()` in `TreeEnsembleModel`, and the rest changes are in comment and not being run.
    
    I'd suggest you put the tests in test module and run them to verify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by facaiy <gi...@git.apache.org>.
Github user facaiy commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Hi, @keypointt . It's the feature of Python. The doctest is both document and unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for...

Posted by facaiy <gi...@git.apache.org>.
Github user facaiy closed the pull request at:

    https://github.com/apache/spark/pull/18120


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by keypointt <gi...@git.apache.org>.
Github user keypointt commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Oh sorry i missed it, thanks for the heads up.
    
    On Sat, May 27, 2017 at 2:16 PM Yan Facai (颜发才) <no...@github.com>
    wrote:
    
    > Hi, @keypointt <https://github.com/keypointt> . It's the feature of
    > Python. The doctest is both document and unit test.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18120#issuecomment-304477003>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/ADvmieVs45e2uCq-g3NV2ViMyc8AbiDEks5r-JJ4gaJpZM4NnVv4>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by facaiy <gi...@git.apache.org>.
Github user facaiy commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Thanks, @BryanCutler.
    It seems that #17849 copys `Params` from `Estimator` to `Model` automatically, which is pretty useful. However, `getter` method is still missing and need to be added manually as the pr.
    
    If only model could inherit both params and its getter method, perhaps it will be better in my opinion.
    
    Anyway, it is OK whether the pr will be merged or not, as it is trivial.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Thanks @facaiy for the PR.  This might be enough to simply retrieve the value from the Java model, but I think the Python model also needs to "own" the param.  For example, if we have a `DecisionTreeRegressor` called `dt` and a `DecisionTreeRegressionModel` called `model` then
    ```
    In [8]: dt.hasParam("maxDepth")
    Out[8]: True
    
    In [9]: model.hasParam("maxDepth")
    Out[9]: False
    ```
    This is because the Python object does not have an instance of the param, its only getting a value from the Java model.  Additionally, many of the methods you would expect to work from class `Params` would raise an error like
    ```
    In [4]: dt.explainParam("maxDepth")
    Out[4]: 'maxDepth: Maximum depth of the tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default: 5, current: 2)
    
    In [5]: model.explainParam("maxDepth")
    ...
    AttributeError: 'DecisionTreeRegressionModel' object has no attribute 'maxDepth'
    ```
    
    As @sethah pointed out #17849 has the fix so that the Python models would have an instance of each param, so that should go in first.  Then, the accessor could be written like this:
    ```
    def getMaxDepth(self):
        return self.getOrDefault(self.maxDepth)
    ```
    I'm not sure what the best approach for adding these accessors, all at once or one by one as needed, like with `maxDepth`?  
    
    cc @holdenk @jkbradley for your input


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18120: [SPARK-20498][PYSPARK][ML] Expose getMaxDepth for ensemb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18120
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org