You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2016/05/10 22:58:19 UTC

[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/13036

    [SPARK-15243][ML][PYSPARK] Param methods should use basestring for type checking

    ## What changes were proposed in this pull request?
    
    The following methods used `isinstance(value, str)` for checking string types in Python ML params:
    
    * `_resolveParam(param)`
    * `hasParam(param)`
    
    This causes a `ValueError` in Python 2.x when `param` is a unicode string:
    
    ```python
    >>> from pyspark.ml.classification import LogisticRegression
    >>> lr = LogisticRegression()
    >>> lr.hasParam("threshold")
    True
    >>> lr.hasParam(u"threshold")
    Traceback (most recent call last):
     ...
        raise TypeError("hasParam(): paramName must be a string")
    TypeError: hasParam(): paramName must be a string
    >>> 
    ```
    
    
    ## How was this patch tested?
    
    Unit tests added to `python/ml/tests.py`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark SPARK-15243

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13036
    
----
commit 878fc5fda77ee8824c3f7b66bdde8ea09b6d3274
Author: sethah <se...@gmail.com>
Date:   2016-05-10T22:44:10Z

    check for basestring in param methods

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods sho...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r84476560
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -799,25 +799,30 @@ def test_first_last_ignorenulls(self):
     
         def test_approxQuantile(self):
             df = self.sc.parallelize([Row(a=i) for i in range(10)]).toDF()
    -        aq = df.stat.approxQuantile("a", [0.1, 0.5, 0.9], 0.1)
    +        aq = df.stat.approxQuantile(u"a", [0.1, 0.5, 0.9], 0.1)
    --- End diff --
    
    Basically in these tests the field names are all ascii characters. Is it possibly to add tests using non-ascii characters so we can make sure it works?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218533851
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    @holdenk please feel free to take this over. Can't find time to work on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218318826
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58289/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218533642
  
    **[Test build #58386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58386/consoleFull)** for PR 13036 at commit [`b04ac41`](https://github.com/apache/spark/commit/b04ac413642fdfa1243f27f3790ef00019a1b047).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #67032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67032/consoleFull)** for PR 13036 at commit [`c6a8828`](https://github.com/apache/spark/commit/c6a88280ea59eb6535059c75528c078b2834f42f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah closed the pull request at:

    https://github.com/apache/spark/pull/13036


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r62978732
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    ah, got it. I just mean from SQL parser.
    
    Similarly, as the unicode column name will be encoded by `name.encode('utf-8')`, it is now a `str` instance. In other words, the schema still stores column names as `str`. However, this change is allowing unicode input as `col`. I think there will be mismatching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Ok, lets see if maybe @zero323 or @HyukjinKwon are interested in taking this over. Otherwise I'll add this to my backlog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r62979512
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    So I think we don't need to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r62962819
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    I am not sure if this change is needed. Because I think in SQL the column name is only allowed with alphabet, digit and underline, so it is a question why users will use unicode string as column in particular.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods sho...

Posted by k-yokoshi <gi...@git.apache.org>.
Github user k-yokoshi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r82816487
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    I agree with you. There is no problem caused by allowing unicode here.
    As you mentioned, it's better to handle `'a'` and `u'a'` because there are few cases that unicode is passed. (e.g. when `__future__.unicode_literals` is imported in Python 2.) 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #66542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66542/consoleFull)** for PR 13036 at commit [`48f0557`](https://github.com/apache/spark/commit/48f05576c44ccfef98391f44c55b91d619fdac81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    I am happy to do so. I assume that It seems already almost done except for https://github.com/apache/spark/pull/13036#discussion_r84476560?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #66542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66542/consoleFull)** for PR 13036 at commit [`48f0557`](https://github.com/apache/spark/commit/48f05576c44ccfef98391f44c55b91d619fdac81).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218529020
  
    **[Test build #58386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58386/consoleFull)** for PR 13036 at commit [`b04ac41`](https://github.com/apache/spark/commit/b04ac413642fdfa1243f27f3790ef00019a1b047).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods sho...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r82480308
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    Is there some harm in allowing unicode here though? If my column is `'a'` and I call `sampleBy(u'a')` it will work after this change, otherwise it will throw an error. I think it's better to treat `'a'` and `u'a'` as equivalent...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66542/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218533853
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58386/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #67093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67093/consoleFull)** for PR 13036 at commit [`976d682`](https://github.com/apache/spark/commit/976d682780e90151a7d426ca80d38f0a9c317a3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67093/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #67032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67032/consoleFull)** for PR 13036 at commit [`c6a8828`](https://github.com/apache/spark/commit/c6a88280ea59eb6535059c75528c078b2834f42f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67032/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    ping @holdenk @viirya 
    
    I think this is ready now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Gentle ping, whats the status of this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by MechCoder <gi...@git.apache.org>.
Github user MechCoder commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Checking old PRs---is this active still?  The ML parts look good to me.  I haven't checked the SQL ones carefully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    **[Test build #67093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67093/consoleFull)** for PR 13036 at commit [`976d682`](https://github.com/apache/spark/commit/976d682780e90151a7d426ca80d38f0a9c317a3b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218492143
  
    Currently investigating some other usages of `isinstance(obj, str)` for this PR. Will update soon.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218317162
  
    **[Test build #58289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58289/consoleFull)** for PR 13036 at commit [`878fc5f`](https://github.com/apache/spark/commit/878fc5fda77ee8824c3f7b66bdde8ea09b6d3274).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by k-yokoshi <gi...@git.apache.org>.
Github user k-yokoshi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r63000521
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    Thank you for answering. I understood why `isinstance(col, basestring)` is not needed here.
    
    Although column name is basically stored as `str`, it is stored as `unicode` in a certain case.
    See [SPARK-15244](https://issues.apache.org/jira/browse/SPARK-15244) for details.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218318825
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][PYSPARK] Param methods shoul...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218318743
  
    **[Test build #58289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58289/consoleFull)** for PR 13036 at commit [`878fc5f`](https://github.com/apache/spark/commit/878fc5fda77ee8824c3f7b66bdde8ea09b6d3274).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/13036#issuecomment-218530440
  
    I updated instances of similar checks in the sql library as noted on the Jira. I searched and this type of check now only exists [here](https://github.com/apache/spark/blob/master/python/pyspark/sql/conf.py#L59) and [here](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L404). They don't cause problems with unicode though, so I did not change them, but I can do that if needed.
    
    cc @viirya I'm not as familiar with the sql library, could you check those changes? 
    
    Also cc @holdenk @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15243][ML][SQL][PYSPARK] Param methods ...

Posted by k-yokoshi <gi...@git.apache.org>.
Github user k-yokoshi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13036#discussion_r62975214
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -470,7 +470,7 @@ def sampleBy(self, col, fractions, seed=None):
             +---+-----+
     
             """
    -        if not isinstance(col, str):
    +        if not isinstance(col, basestring):
    --- End diff --
    
    According to https://github.com/apache/spark/commit/f958f27e2056f9e380373c2807d8bb5977ecf269, it seems to be possible to use Non-ascii characters in column name.
    I think there are use cases which want to use non-ascii character in column name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13036: [SPARK-15243][ML][SQL][PYSPARK] Param methods should use...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/13036
  
    Just a quick ping @sethah - I know your pretty busy but I'm assuming this is still active. One minor note is it seems there is another new addition to types.py which maybe should also be changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org