You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/02/28 08:59:08 UTC

[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYSPARK] Add missing suppo...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/17096

    [SPARK-15243][ML][SQL][PYSPARK] Add missing support for unicode in Param methods/functions in dataframe/types

    ## What changes were proposed in this pull request?
    
    This PR proposes to support unicodes in Param methods in ML, other missed functions in DataFrame and other missed one as a column name in types.
    
    For example, this causes a `ValueError` in Python 2.x when param is a unicode string:
    
    ```python
    >>> from pyspark.ml.classification import LogisticRegression
    >>> lr = LogisticRegression()
    >>> lr.hasParam("threshold")
    True
    >>> lr.hasParam(u"threshold")
    Traceback (most recent call last):
     ...
        raise TypeError("hasParam(): paramName must be a string")
    TypeError: hasParam(): paramName must be a string
    ```
    
    This PR is based on https://github.com/apache/spark/pull/13036
    
    ## How was this patch tested?
    
    Unit tests in `python/pyspark/ml/tests.py` and `python/pyspark/sql/tests.py`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-15243

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17096
    
----
commit 762997d266c78c16e2e6caaee8daa557ec07820b
Author: sethah <se...@gmail.com>
Date:   2016-05-10T22:44:10Z

    check for basestring in param methods

commit d421a82e1a713b9c7351dfc6a2a6a5fb13eca1e2
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-02-28T08:46:58Z

    Fix updated approxQuantile to take basestrings and fix the tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    I'd really like to see that further test I was talking about, @HyukjinKwon -- it shouldn't be too hard to do in this pr right? Just add a unicode string which doesn't make sense in down converted ascii.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thanks for taking the time to review this @viirya :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    @viirya, thank you so much for taking a look and your time. 
    
    So, basically, the second case it compares str to unicode as below:
    
    ```python
    >>> u"\u6e2c\u8a66" == u"\u6e2c\u8a66".encode("utf-8")
    False
    ```
    
    Apparently, it seems we could pass unicode as is? Let me raise another issue for this after testing and looking into this. Actually, the support in `StructType.add` seems not the problem specified in the JIRA. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #81488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81488/testReport)** for PR 17096 at commit [`830b4fe`](https://github.com/apache/spark/commit/830b4fe1f71befb97debd9286306b3f872eb1c09).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81488/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Hey @holdenk. I am willing to close this for now if you are not confident enough for merging it for now. I can re-open this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73688/testReport)** for PR 17096 at commit [`9ac773c`](https://github.com/apache/spark/commit/9ac773c6f2855edc91f9e06e506499db2ca232d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thank you @viirya for your sign-off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r104698507
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -320,6 +320,13 @@ def test_hasparam(self):
             testParams = TestParams()
             self.assertTrue(all([testParams.hasParam(p.name) for p in testParams.params]))
             self.assertFalse(testParams.hasParam("notAParameter"))
    +        self.assertTrue(testParams.hasParam(u"maxIter"))
    +
    +    def test_resolveparam(self):
    +        testParams = TestParams()
    +        self.assertEqual(testParams._resolveParam(testParams.maxIter), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam("maxIter"), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam(u"maxIter"), testParams.maxIter)
    --- End diff --
    
    Would it make sense to also add a param with a unicode name that if it was converted down to ASCII behind the scenes and test it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thank you @viirya. I think it is ready now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17096


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r137369131
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -319,6 +320,20 @@ def test_hasparam(self):
             testParams = TestParams()
             self.assertTrue(all([testParams.hasParam(p.name) for p in testParams.params]))
             self.assertFalse(testParams.hasParam("notAParameter"))
    +        self.assertTrue(testParams.hasParam(u"maxIter"))
    +
    +    def test_resolveparam(self):
    +        testParams = TestParams()
    +        self.assertEqual(testParams._resolveParam(testParams.maxIter), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam("maxIter"), testParams.maxIter)
    +
    +        self.assertEqual(testParams._resolveParam(u"maxIter"), testParams.maxIter)
    +        if sys.version_info[0] >= 3:
    --- End diff --
    
    Yes! :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73579/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #81488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81488/testReport)** for PR 17096 at commit [`830b4fe`](https://github.com/apache/spark/commit/830b4fe1f71befb97debd9286306b3f872eb1c09).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thank you @holdenk, @viirya and @ueshin.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73688/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thanks, merged to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    In `StructType.add`, if the given field name is a `basestring`, we will directly use it as key in `names` property.
    
    If we pass in a `StructField`, we take `StructField.name` as key in `names`. `But StructField` will encode field name as utf-8 if it is not a `str`.
    
    And `StructType.__getitem__` will find matched field by comparing each `StructField.name` with given search key. So it is unable to find the field back with the unicode field name.
    
            from pyspark.sql.types import StructType, StringType, StructField
    
            struct1 = StructType().add(u"a", "string", True)
            struct2 = StructType([StructField(u"a", StringType(), True)])
            self.assertTrue(struct1 == struct2)  # pass
            self.assertTrue(struct1[u"a"] == struct2[u"a"]) #pass
    
            struct1 = StructType().add(u"\u6e2c\u8a66", "string", True)
            struct2 = StructType([StructField(u"\u6e2c\u8a66", StringType(), True)])
            self.assertTrue(struct1 == struct2)  # fail
            self.assertTrue(struct1[u"\u6e2c\u8a66"] == struct2[u"\u6e2c\u8a66"]) # fail, you can't find the field with key u"\u6e2c\u8a66"



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    @HyukjinKwon @holdenk Hi, are you still working on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73822/testReport)** for PR 17096 at commit [`cd235a7`](https://github.com/apache/spark/commit/cd235a7f641da8a350b8ace0e4c0691ccac189f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    gentle ping ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYSPARK] Add missing support for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73579/testReport)** for PR 17096 at commit [`d421a82`](https://github.com/apache/spark/commit/d421a82e1a713b9c7351dfc6a2a6a5fb13eca1e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Remaining changes LGTM. cc @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73822/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73579/testReport)** for PR 17096 at commit [`d421a82`](https://github.com/apache/spark/commit/d421a82e1a713b9c7351dfc6a2a6a5fb13eca1e2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Jenkins retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73688/testReport)** for PR 17096 at commit [`9ac773c`](https://github.com/apache/spark/commit/9ac773c6f2855edc91f9e06e506499db2ca232d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    @holdenk and @viirya, I got rid of the changes in `types.py` and only left that I am pretty sure.
    
    There are two kind of changes here that look used in the only local scope.
    
    One seems for used `getattr` I guess it is fine as below:
    
    ```python
    >>> getattr("a", u"__str__")
    <method-wrapper '__str__' of str object at 0x10a24e580>
    >>> getattr("a", "__str__")
    <method-wrapper '__str__' of str object at 0x10a24e580>
    ```
    
    and other one seems used for setting an parameter to JVM which seems already used in the code base much more.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Few minor comments. Overall looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79038/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYSPARK] Add missing support for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    cc @sethah, @holdenk, @viirya and @k-yokoshi


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r125173260
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -319,6 +320,20 @@ def test_hasparam(self):
             testParams = TestParams()
             self.assertTrue(all([testParams.hasParam(p.name) for p in testParams.params]))
             self.assertFalse(testParams.hasParam("notAParameter"))
    +        self.assertTrue(testParams.hasParam(u"maxIter"))
    +
    +    def test_resolveparam(self):
    +        testParams = TestParams()
    +        self.assertEqual(testParams._resolveParam(testParams.maxIter), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam("maxIter"), testParams.maxIter)
    +
    +        self.assertEqual(testParams._resolveParam(u"maxIter"), testParams.maxIter)
    +        if sys.version_info[0] >= 3:
    --- End diff --
    
    @holdenk, would this test address your concern enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Let me check if each is fine for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #79038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79038/testReport)** for PR 17096 at commit [`830b4fe`](https://github.com/apache/spark/commit/830b4fe1f71befb97debd9286306b3f872eb1c09).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Let's double check with @viirya to make sure his comment was addressed, but I really appreciate the improved test coverage :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Sorry for the delay. Lets get Jenkins to retest this and make sure everything is ok but it looks like a good change :)
    
    LGTM pending jenkins/merge issues (if they show up during jenkins). Jenkins retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r103637509
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1035,17 +1036,22 @@ def test_approxQuantile(self):
         def test_corr(self):
             import math
             df = self.sc.parallelize([Row(a=i, b=math.sqrt(i)) for i in range(10)]).toDF()
    -        corr = df.stat.corr("a", "b")
    +        corr = df.stat.corr(u"a", "b")
             self.assertTrue(abs(corr - 0.95734012) < 1e-6)
     
    +    def test_sampleby(self):
    +        df = self.sc.parallelize([Row(a=i, b=(i % 3)) for i in range(10)]).toDF()
    +        corr = df.stat.sampleBy(u"b", fractions={0: 0.5, 1: 0.5}, seed=0)
    --- End diff --
    
    nit: `corr` -> `sampled`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #73822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73822/testReport)** for PR 17096 at commit [`cd235a7`](https://github.com/apache/spark/commit/cd235a7f641da8a350b8ace0e4c0691ccac189f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Do you mean a test case such as `self.assertEqual(testParams._resolveParam(u"아"), testParams.아)
    ` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r104856634
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -320,6 +320,13 @@ def test_hasparam(self):
             testParams = TestParams()
             self.assertTrue(all([testParams.hasParam(p.name) for p in testParams.params]))
             self.assertFalse(testParams.hasParam("notAParameter"))
    +        self.assertTrue(testParams.hasParam(u"maxIter"))
    +
    +    def test_resolveparam(self):
    +        testParams = TestParams()
    +        self.assertEqual(testParams._resolveParam(testParams.maxIter), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam("maxIter"), testParams.maxIter)
    +        self.assertEqual(testParams._resolveParam(u"maxIter"), testParams.maxIter)
    --- End diff --
    
    Yes, I think that makes sense, though. It seems that requires more look and tests. I believe the current state resolves the specific JIRA. Maybe, could we merge this as is if you are think either way is fine? I feel It has been dragged by related changes and hope it could be merged if it is okay to you. If it dose not sound good to you, then, let me try to take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Yea, I was waiting for the feedback. It is also about the unicode vs byte string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    Thank you for taking this over @HyukjinKwon :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    @HyukjinKwon Pretty much, yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing suppor...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17096#discussion_r103636686
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1405,18 +1405,18 @@ def approxQuantile(self, col, probabilities, relativeError):
                Added support for multiple columns.
             """
     
    -        if not isinstance(col, (str, list, tuple)):
    +        if not isinstance(col, (basestring, list, tuple)):
                 raise ValueError("col should be a string, list or tuple, but got %r" % type(col))
     
    -        isStr = isinstance(col, str)
    +        isStr = isinstance(col, basestring)
     
             if isinstance(col, tuple):
                 col = list(col)
    -        elif isinstance(col, str):
    +        elif isinstance(col, basestring):
    --- End diff --
    
    nit: can directly use`isStr` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17096: [SPARK-15243][ML][SQL][PYTHON] Add missing support for u...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17096
  
    **[Test build #79038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79038/testReport)** for PR 17096 at commit [`830b4fe`](https://github.com/apache/spark/commit/830b4fe1f71befb97debd9286306b3f872eb1c09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org