You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MLnick <gi...@git.apache.org> on 2016/05/20 19:59:58 UTC

[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

GitHub user MLnick opened a pull request:

    https://github.com/apache/spark/pull/13228

    [SPARK-15442][ML][PYSPARK] Add 'relativeError' param to PySpark QuantileDiscretizer

    This PR adds the `relativeError` param to PySpark's `QuantileDiscretizer` to match Scala.
    
    Also cleaned up a duplication of `numBuckets` where the param is both a class and instance attribute (I removed the instance attr to match the style of params throughout `ml`).
    
    Finally, cleaned up the docs for `QuantileDiscretizer` to reflect that it now uses `approxQuantile`.
    
    ## How was this patch tested?
    
    A little doctest and built API docs locally to check HTML doc generation.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MLnick/spark SPARK-15442-py-relerror-param

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13228
    
----
commit 907123b8c13b6bd180821c8752e1e9d282a2a9fb
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2016-05-20T18:23:34Z

    initial work

commit 60339055b11a57eeb25e93ba1429ca6022916609
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2016-05-20T19:47:26Z

    clean up doc

commit 6417384f3b8a29dadea1089fdccb150a00cf26ad
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2016-05-20T19:57:51Z

    amend doctest param value

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220759647
  
    @jkbradley this is a missing param on the python side - I noticed it while
    reviewing the PR for doc for count vectorizer, hashing tf and quantile
    discretizer.
    
    It also contains doc fixes as quantile discretizer doc is incorrect now
    that it uses approx quantile.
    
    I thought we were still trying to get as much of pyspark parity for 2.0 as
    we can?
    
    Otherwise I can just make the doc changes
    On Sat, 21 May 2016 at 00:35, jkbradley <no...@github.com> wrote:
    
    > Is this targeted at 2.0? I'd prefer we focus on finishing documentation &
    > QA checks before working on new features and APIs.
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/13228#issuecomment-220734278>
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220734278
  
    Is this targeted at 2.0?  I'd prefer we focus on finishing documentation & QA checks before working on new features and APIs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221192898
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59188/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221035663
  
    I do want us to push for pyspark parity as much as possible, but I made those QA JIRAs for Python parity checks to do an audit, not to achieve parity for 2.0.  I definitely find it tempting to push as much API parity stuff in as possible, but I've gradually become convinced it's better to just focus on QA & docs to get the release out the door.  As soon as the release is done, we can focus on the newly created Python JIRAs and other new APIs, etc.
    
    But it's your call on this one.  If this is ready, then maybe it's better to just go ahead and merge it to avoid wasted work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220714826
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13228#discussion_r64270058
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1250,6 +1257,20 @@ def getNumBuckets(self):
             """
             return self.getOrDefault(self.numBuckets)
     
    +    @since("2.0.0")
    +    def setRelativeError(self, value):
    +        """
    +        Sets the value of :py:attr:`relativeError`.
    +        """
    +        return self._set(relativeError=value)
    +
    +    @since("2.0.0")
    +    def getRelativeError(self):
    +        """
    +        Gets the value of relativeError or its default value.
    --- End diff --
    
    @BryanCutler it's a good point - but I've seen that all other docs for the getters through py-ml omit the `:py:attr:`. I'm not certain if this is intended or some oversight that has become "copy-pasted" as de facto standard :) I for one would support updating all of them as it makes sense to me that the doc links match the setters. But it then needs to be done for all Python param setter methods to be consistent.
    
    cc @jkbradley @holdenk @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13228#discussion_r64344703
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1250,6 +1257,20 @@ def getNumBuckets(self):
             """
             return self.getOrDefault(self.numBuckets)
     
    +    @since("2.0.0")
    +    def setRelativeError(self, value):
    +        """
    +        Sets the value of :py:attr:`relativeError`.
    +        """
    +        return self._set(relativeError=value)
    +
    +    @since("2.0.0")
    +    def getRelativeError(self):
    +        """
    +        Gets the value of relativeError or its default value.
    --- End diff --
    
    I think we should probably document the setters at some point, but perhaps 2.1 timeframe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221192896
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221037153
  
    Just reviewed, and this LGTM  : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220729967
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221062894
  
    @jkbradley fair enough - I misinterpreted things thinking we were trying to nail down further Python parity. I do agree the focus should be on docs / QA. As it happens this was trivial to do while working on the doc update :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221184857
  
    jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220714827
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59021/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220705009
  
    **[Test build #59021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59021/consoleFull)** for PR 13228 at commit [`6417384`](https://github.com/apache/spark/commit/6417384f3b8a29dadea1089fdccb150a00cf26ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13228#discussion_r64345475
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1250,6 +1257,20 @@ def getNumBuckets(self):
             """
             return self.getOrDefault(self.numBuckets)
     
    +    @since("2.0.0")
    +    def setRelativeError(self, value):
    +        """
    +        Sets the value of :py:attr:`relativeError`.
    +        """
    +        return self._set(relativeError=value)
    +
    +    @since("2.0.0")
    +    def getRelativeError(self):
    +        """
    +        Gets the value of relativeError or its default value.
    --- End diff --
    
    yeah - though it is a large, but actually trivial, change. We could probably do it in one pass right at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13228


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-220714687
  
    **[Test build #59021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59021/consoleFull)** for PR 13228 at commit [`6417384`](https://github.com/apache/spark/commit/6417384f3b8a29dadea1089fdccb150a00cf26ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13228#discussion_r64113992
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -1250,6 +1257,20 @@ def getNumBuckets(self):
             """
             return self.getOrDefault(self.numBuckets)
     
    +    @since("2.0.0")
    +    def setRelativeError(self, value):
    +        """
    +        Sets the value of :py:attr:`relativeError`.
    +        """
    +        return self._set(relativeError=value)
    +
    +    @since("2.0.0")
    +    def getRelativeError(self):
    +        """
    +        Gets the value of relativeError or its default value.
    --- End diff --
    
    no biggie, but the setter has a link for the param :py:attr:`relativeError`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221192784
  
    **[Test build #59188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59188/consoleFull)** for PR 13228 at commit [`6417384`](https://github.com/apache/spark/commit/6417384f3b8a29dadea1089fdccb150a00cf26ad).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15442][ML][PYSPARK] Add 'relativeError'...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13228#issuecomment-221185187
  
    **[Test build #59188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59188/consoleFull)** for PR 13228 at commit [`6417384`](https://github.com/apache/spark/commit/6417384f3b8a29dadea1089fdccb150a00cf26ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org