You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/01/25 00:30:52 UTC

[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/20390

    [SPARK-23081][PYTHON]Add colRegex API to PySpark

    ## What changes were proposed in this pull request?
    
    Add colRegex API to PySpark
    
    ## How was this patch tested?
    
    add a test in sql/tests.py


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-23081

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20390
    
----
commit d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7
Author: Huaxin Gao <hu...@...>
Date:   2018-01-25T00:26:49Z

    [SPARK-23081][PYTHON]Add colRegex API to PySpark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86617/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86617/testReport)** for PR 20390 at commit [`d1b4761`](https://github.com/apache/spark/commit/d1b476108ccdf1fae5ccf3f6868e0c0a6427ff17).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86611/testReport)** for PR 20390 at commit [`d08ed6b`](https://github.com/apache/spark/commit/d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Since our Spark 2.3 RC2 will fail, we can target it to 2.3 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86653/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163755064
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,28 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.4)
    --- End diff --
    
    Could we put this API between `def columns(self):` and `def alias(self, alias):`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Awesome
    LGTM pending test passes
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163721973
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,15 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.3)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    +        """
    +        jc = self._jdf.colRegex(colName)
    --- End diff --
    
    Could we add a type check here too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86626/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86653/testReport)** for PR 20390 at commit [`4a58e95`](https://github.com/apache/spark/commit/4a58e951876b06f36fd27c795984e50f2acc004b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86617/testReport)** for PR 20390 at commit [`d1b4761`](https://github.com/apache/spark/commit/d1b476108ccdf1fae5ccf3f6868e0c0a6427ff17).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    LGTM except the above two comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20390


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86653/testReport)** for PR 20390 at commit [`4a58e95`](https://github.com/apache/spark/commit/4a58e951876b06f36fd27c795984e50f2acc004b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/215/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163933858
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    --- End diff --
    
    Unfortunately, we have the same issue in Dataset.colRegex. Please also correct that too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163933231
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    --- End diff --
    
    Nit: -> `returns`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/221/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163756176
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,28 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    +
    +        :param colName: string, column name specified as a regex.
    +
    +        >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)])
    +        >>> df.select(df.colRegex("`(_1)?+.+`")).show()
    +        +---+
    +        | _2|
    +        +---+
    +        |  1|
    +        |  2|
    +        |  3|
    +        +---+
    +        """
    +        assert isinstance(colName, basestring), "colName should be a string"
    --- End diff --
    
    I think `TypeError` with an if could be more correct.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86649/testReport)** for PR 20390 at commit [`54a26ce`](https://github.com/apache/spark/commit/54a26cef68d1252003f19dc762c39f16048326e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163933411
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    --- End diff --
    
    -> `2.3`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86649/testReport)** for PR 20390 at commit [`54a26ce`](https://github.com/apache/spark/commit/54a26cef68d1252003f19dc762c39f16048326e6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163940181
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    --- End diff --
    
    @gatorsmile Thanks for your comments. I will make the changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163755876
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2855,6 +2855,10 @@ def test_create_dataframe_from_old_pandas(self):
                 with self.assertRaisesRegexp(ImportError, 'Pandas >= .* must be installed'):
                     self.spark.createDataFrame(pdf)
     
    +    def test_colRegex(self):
    +        df = self.spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)])
    +        self.assertEqual(df.select(df.colRegex("`(_1)?+.+`")).collect(), df.select("_2").collect())
    --- End diff --
    
    I think this is actually being tested in doctest. Seems we can remove out. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86626/testReport)** for PR 20390 at commit [`92ee53a`](https://github.com/apache/spark/commit/92ee53a8af720cb107dd0da7e1ea6eaaf32f0c06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/210/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86611/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86649/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86626/testReport)** for PR 20390 at commit [`92ee53a`](https://github.com/apache/spark/commit/92ee53a8af720cb107dd0da7e1ea6eaaf32f0c06).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/240/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163721761
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,15 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.3)
    --- End diff --
    
    I think this should be 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163925810
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    +
    +        :param colName: string, column name specified as a regex.
    +
    +        >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)])
    +        >>> df.select(df.colRegex("`(_1)?+.+`")).show()
    --- End diff --
    
    @felixcheung Thanks for your comment! I will make changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/242/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163731132
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,15 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.3)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    +        """
    +        jc = self._jdf.colRegex(colName)
    --- End diff --
    
    @HyukjinKwon Thank you very much for your comments. I will submit changes soon. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163759921
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2855,6 +2855,10 @@ def test_create_dataframe_from_old_pandas(self):
                 with self.assertRaisesRegexp(ImportError, 'Pandas >= .* must be installed'):
                     self.spark.createDataFrame(pdf)
     
    +    def test_colRegex(self):
    +        df = self.spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)])
    +        self.assertEqual(df.select(df.colRegex("`(_1)?+.+`")).collect(), df.select("_2").collect())
    --- End diff --
    
    @HyukjinKwon Thanks! I will make the changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163791518
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -819,6 +819,29 @@ def columns(self):
             """
             return [f.name for f in self.schema.fields]
     
    +    @since(2.4)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    +
    +        :param colName: string, column name specified as a regex.
    +
    +        >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)])
    +        >>> df.select(df.colRegex("`(_1)?+.+`")).show()
    --- End diff --
    
    nit: perhaps a bit obscure to pick the default column name of `_1`?
    how about we name the columns in the line above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20390#discussion_r163721837
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1881,6 +1881,15 @@ def toDF(self, *cols):
             jdf = self._jdf.toDF(self._jseq(cols))
             return DataFrame(jdf, self.sql_ctx)
     
    +    @since(2.3)
    +    def colRegex(self, colName):
    +        """
    +        Selects column based on the column name specified as a regex and return it
    +        as :class:`Column`.
    --- End diff --
    
    Shall we add a doctest and `:param ` too while we are here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    Thank you all for your help! @HyukjinKwon @gatorsmile @felixcheung 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20390
  
    **[Test build #86611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86611/testReport)** for PR 20390 at commit [`d08ed6b`](https://github.com/apache/spark/commit/d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org