You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/08/19 08:32:08 UTC

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/18999

    [SPARK-21779][PYTHON] Simpler Dataset.sample API in Python

    ## What changes were proposed in this pull request?
    
    This PR make `DataFrame.sample(...)` can omit `withReplacement` defaulting `False`, consistently with equivalent Scala / Java API.
    
    In short, the following examples are allowed:
    
    ```python
    >>> df = spark.range(10)
    >>> df.sample(0.5).count()
    7
    >>> df.sample(fraction=0.5).count()
    3
    >>> df.sample(0.5, seed=42).count()
    5
    >>> df.sample(fraction=0.5, seed=42).count()
    5
    ```
    
    In addition, this PR also adds some type checking logics as below:
    
    ```python
    >>> df = spark.range(10)
    >>> df.sample(True).count()
    ...
    TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got <type 'bool'>.
    >>> df.sample(42).count()
    ...
    TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got <type 'int'>.
    >>> df.sample(fraction=False, seed="a").count()
    ...
    TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got <type 'bool'>, <type 'str'>.
    >>> df.sample(seed=[1]).count()
    ...
    TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got <type 'list'>.
    >>> df.sample(withReplacement="a", fraction=0.5, seed=1)
    ...
    TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got <type 'str'>, <type 'float'>, <type 'int'>.
    ```
    
    ## How was this patch tested?
    
    Manually tested, unit tests added in doc tests and manually checked the built documentation for Python.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21779

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18999
    
----
commit 5de97d1c8e0717315797661a47c367b721ee3aa8
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-08-19T08:18:51Z

    Simpler Dataset.sample API in Python

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API in Pyth...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80870/testReport)** for PR 18999 at commit [`5de97d1`](https://github.com/apache/spark/commit/5de97d1c8e0717315797661a47c367b721ee3aa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134086691
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    -        seed = seed if seed is not None else random.randint(0, sys.maxsize)
    --- End diff --
    
    I also removed `random.randint(0, sys.maxsize)` and tried to directly call Scala / Java side one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18999


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API in Pyth...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80870/testReport)** for PR 18999 at commit [`5de97d1`](https://github.com/apache/spark/commit/5de97d1c8e0717315797661a47c367b721ee3aa8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80898/testReport)** for PR 18999 at commit [`0328446`](https://github.com/apache/spark/commit/0328446bfb4825acc3e0c7620e72eee1c831d5db).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134086684
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    --- End diff --
    
    I removed this as it looks checked in Scala / Java side:
    
    ```python
    >>> df.sample(fraction=-0.1).count()
    ...
    pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Sampling fraction (-0.1) must be on interval [0, 1] without replacement'
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134123916
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    -        seed = seed if seed is not None else random.randint(0, sys.maxsize)
    -        rdd = self._jdf.sample(withReplacement, fraction, long(seed))
    -        return DataFrame(rdd, self.sql_ctx)
    +        .. note:: `fraction` is required and, `withReplacement` and `seed` are optional.
    +
    +        >>> df = spark.range(10)
    +        >>> df.sample(0.5, 3).count()
    +        4
    +        >>> df.sample(fraction=0.5, seed=3).count()
    +        4
    +        >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count()
    +        1
    +        >>> df.sample(1.0).count()
    +        10
    +        >>> df.sample(fraction=1.0).count()
    +        10
    +        >>> df.sample(False, fraction=1.0).count()
    +        10
    +        >>> df.sample("a").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        >>> df.sample(seed="abc").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    --- End diff --
    
    that makes sense! doc tests are examples users can follow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r135132622
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    --- End diff --
    
    yea it'd be better to have python handle the simpler error checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    https://github.com/apache/spark/pull/18999#discussion_r134131441 looks hidden. I addressed the other comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80912/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API in Pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80870/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80911/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80911/testReport)** for PR 18999 at commit [`24525bc`](https://github.com/apache/spark/commit/24525bc75c886ead4c88a2b6d899c6f9a3947420).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134092119
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    -        seed = seed if seed is not None else random.randint(0, sys.maxsize)
    -        rdd = self._jdf.sample(withReplacement, fraction, long(seed))
    -        return DataFrame(rdd, self.sql_ctx)
    +        .. note:: `fraction` is required and, `withReplacement` and `seed` are optional.
    +
    +        >>> df = spark.range(10)
    +        >>> df.sample(0.5, 3).count()
    +        4
    +        >>> df.sample(fraction=0.5, seed=3).count()
    +        4
    +        >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count()
    +        1
    +        >>> df.sample(1.0).count()
    +        10
    +        >>> df.sample(fraction=1.0).count()
    +        10
    +        >>> df.sample(False, fraction=1.0).count()
    +        10
    +        >>> df.sample("a").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        >>> df.sample(seed="abc").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        """
    +
    +        # For the cases below:
    +        #   sample(True, 0.5 [, seed])
    +        #   sample(True, fraction=0.5 [, seed])
    +        #   sample(withReplacement=False, fraction=0.5 [, seed])
    +        is_withReplacement_set = \
    +            type(withReplacement) == bool and isinstance(fraction, float)
    +
    +        # For the case below:
    +        #   sample(faction=0.5 [, seed])
    +        is_withReplacement_omitted_kwargs = \
    +            withReplacement is None and isinstance(fraction, float)
    +
    +        # For the case below:
    +        #   sample(0.5 [, seed])
    +        is_withReplacement_omitted_args = isinstance(withReplacement, float)
    +
    +        if not (is_withReplacement_set
    +                or is_withReplacement_omitted_kwargs
    +                or is_withReplacement_omitted_args):
    +            argtypes = [
    +                str(type(arg)) for arg in [withReplacement, fraction, seed] if arg is not None]
    +            raise TypeError(
    +                "withReplacement (optional), fraction (required) and seed (optional)"
    +                " should be a bool, float and number; however, "
    +                "got %s." % ", ".join(argtypes))
    --- End diff --
    
    By this change, all three parameters can be `None` by default, `argtypes` seems to be an empty list here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134092696
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    -        seed = seed if seed is not None else random.randint(0, sys.maxsize)
    -        rdd = self._jdf.sample(withReplacement, fraction, long(seed))
    -        return DataFrame(rdd, self.sql_ctx)
    +        .. note:: `fraction` is required and, `withReplacement` and `seed` are optional.
    +
    +        >>> df = spark.range(10)
    +        >>> df.sample(0.5, 3).count()
    +        4
    +        >>> df.sample(fraction=0.5, seed=3).count()
    +        4
    +        >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count()
    +        1
    +        >>> df.sample(1.0).count()
    +        10
    +        >>> df.sample(fraction=1.0).count()
    +        10
    +        >>> df.sample(False, fraction=1.0).count()
    +        10
    +        >>> df.sample("a").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        >>> df.sample(seed="abc").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        """
    +
    +        # For the cases below:
    +        #   sample(True, 0.5 [, seed])
    +        #   sample(True, fraction=0.5 [, seed])
    +        #   sample(withReplacement=False, fraction=0.5 [, seed])
    +        is_withReplacement_set = \
    +            type(withReplacement) == bool and isinstance(fraction, float)
    +
    +        # For the case below:
    +        #   sample(faction=0.5 [, seed])
    +        is_withReplacement_omitted_kwargs = \
    +            withReplacement is None and isinstance(fraction, float)
    +
    +        # For the case below:
    +        #   sample(0.5 [, seed])
    +        is_withReplacement_omitted_args = isinstance(withReplacement, float)
    +
    +        if not (is_withReplacement_set
    +                or is_withReplacement_omitted_kwargs
    +                or is_withReplacement_omitted_args):
    +            argtypes = [
    +                str(type(arg)) for arg in [withReplacement, fraction, seed] if arg is not None]
    +            raise TypeError(
    +                "withReplacement (optional), fraction (required) and seed (optional)"
    +                " should be a bool, float and number; however, "
    +                "got %s." % ", ".join(argtypes))
    --- End diff --
    
    Yea, it looks so. Let me try to improve this message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API in Pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134131441
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    --- End diff --
    
    Hm.. wouldn't we better avoid duplicating expression requirement? It looks I should do:
    
    https://github.com/apache/spark/blob/5ad1796b9fd6bce31bbc1cdc2f607115d2dd0e7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L714-L722
    
    within Python side. I have been thinking of avoiding it if the error message makes sense to Python users (but not the case of exposing non-Pythonic error messages, for example, Java types `java.lang.Long` in the error message) although I understand it is better to throw an exception ahead before going to JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #81303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81303/testReport)** for PR 18999 at commit [`f2608ab`](https://github.com/apache/spark/commit/f2608ab0ca1e64ce97d65bffb62a07935e4b3db8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80912/testReport)** for PR 18999 at commit [`f2608ab`](https://github.com/apache/spark/commit/f2608ab0ca1e64ce97d65bffb62a07935e4b3db8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    @rxin, would you maybe have some opinion on https://github.com/apache/spark/pull/18999#discussion_r134131441 (avoiding fraction checking within Python side)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134123358
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    --- End diff --
    
    I'd do the check in python, so the error message is more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Thank you @viirya, @felixcheung, @rxin and @ueshin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80911/testReport)** for PR 18999 at commit [`24525bc`](https://github.com/apache/spark/commit/24525bc75c886ead4c88a2b6d899c6f9a3947420).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18999#discussion_r134123764
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -659,19 +659,77 @@ def distinct(self):
             return DataFrame(self._jdf.distinct(), self.sql_ctx)
     
         @since(1.3)
    -    def sample(self, withReplacement, fraction, seed=None):
    +    def sample(self, withReplacement=None, fraction=None, seed=None):
             """Returns a sampled subset of this :class:`DataFrame`.
     
    +        :param withReplacement: Sample with replacement or not (default False).
    +        :param fraction: Fraction of rows to generate, range [0.0, 1.0].
    +        :param seed: Seed for sampling (default a random seed).
    +
             .. note:: This is not guaranteed to provide exactly the fraction specified of the total
                 count of the given :class:`DataFrame`.
     
    -        >>> df.sample(False, 0.5, 42).count()
    -        2
    -        """
    -        assert fraction >= 0.0, "Negative fraction value: %s" % fraction
    -        seed = seed if seed is not None else random.randint(0, sys.maxsize)
    -        rdd = self._jdf.sample(withReplacement, fraction, long(seed))
    -        return DataFrame(rdd, self.sql_ctx)
    +        .. note:: `fraction` is required and, `withReplacement` and `seed` are optional.
    +
    +        >>> df = spark.range(10)
    +        >>> df.sample(0.5, 3).count()
    +        4
    +        >>> df.sample(fraction=0.5, seed=3).count()
    +        4
    +        >>> df.sample(withReplacement=True, fraction=0.5, seed=3).count()
    +        1
    +        >>> df.sample(1.0).count()
    +        10
    +        >>> df.sample(fraction=1.0).count()
    +        10
    +        >>> df.sample(False, fraction=1.0).count()
    +        10
    +        >>> df.sample("a").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    +        >>> df.sample(seed="abc").count()
    +        Traceback (most recent call last):
    +            ...
    +        TypeError:...
    --- End diff --
    
    maybe we don't do the error cases here in doctest, but move them to unit test instead?
    also these cases aren't really that meaningfully different to me as an user....?
    ```
            >>> df.sample(0.5, 3).count()
     +        4
     +        >>> df.sample(fraction=0.5, seed=3).count()
     +        4
     +        >>> df.sample(1.0).count()
     +        10
     +        >>> df.sample(fraction=1.0).count()
     +        10
     +        >>> df.sample(False, fraction=1.0).count()
     +        10
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80898/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #81303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81303/testReport)** for PR 18999 at commit [`f2608ab`](https://github.com/apache/spark/commit/f2608ab0ca1e64ce97d65bffb62a07935e4b3db8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    cc @holdenk and @ueshin, could you maybe take a look when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81303/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80898/testReport)** for PR 18999 at commit [`0328446`](https://github.com/apache/spark/commit/0328446bfb4825acc3e0c7620e72eee1c831d5db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    **[Test build #80912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80912/testReport)** for PR 18999 at commit [`f2608ab`](https://github.com/apache/spark/commit/f2608ab0ca1e64ce97d65bffb62a07935e4b3db8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler Dataset.sample API in Pyth...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18999
  
    cc @rxin. Dose this make sense to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org