You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/01/25 00:30:52 UTC
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
GitHub user huaxingao opened a pull request:
https://github.com/apache/spark/pull/20390
[SPARK-23081][PYTHON]Add colRegex API to PySpark
## What changes were proposed in this pull request?
Add colRegex API to PySpark
## How was this patch tested?
add a test in sql/tests.py
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/huaxingao/spark spark-23081
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20390.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20390
----
commit d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7
Author: Huaxin Gao <hu...@...>
Date: 2018-01-25T00:26:49Z
[SPARK-23081][PYTHON]Add colRegex API to PySpark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86617/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86617/testReport)** for PR 20390 at commit [`d1b4761`](https://github.com/apache/spark/commit/d1b476108ccdf1fae5ccf3f6868e0c0a6427ff17).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86611/testReport)** for PR 20390 at commit [`d08ed6b`](https://github.com/apache/spark/commit/d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20390
Since our Spark 2.3 RC2 will fail, we can target it to 2.3
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86653/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163755064
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,28 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.4)
--- End diff --
Could we put this API between `def columns(self):` and `def alias(self, alias):`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/20390
Awesome
LGTM pending test passes
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163721973
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,15 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.3)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
+ """
+ jc = self._jdf.colRegex(colName)
--- End diff --
Could we add a type check here too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86626/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86653/testReport)** for PR 20390 at commit [`4a58e95`](https://github.com/apache/spark/commit/4a58e951876b06f36fd27c795984e50f2acc004b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86617/testReport)** for PR 20390 at commit [`d1b4761`](https://github.com/apache/spark/commit/d1b476108ccdf1fae5ccf3f6868e0c0a6427ff17).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20390
LGTM except the above two comments.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20390
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86653/testReport)** for PR 20390 at commit [`4a58e95`](https://github.com/apache/spark/commit/4a58e951876b06f36fd27c795984e50f2acc004b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/215/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163933858
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
--- End diff --
Unfortunately, we have the same issue in Dataset.colRegex. Please also correct that too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163933231
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
--- End diff --
Nit: -> `returns`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/221/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163756176
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,28 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
+
+ :param colName: string, column name specified as a regex.
+
+ >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)])
+ >>> df.select(df.colRegex("`(_1)?+.+`")).show()
+ +---+
+ | _2|
+ +---+
+ | 1|
+ | 2|
+ | 3|
+ +---+
+ """
+ assert isinstance(colName, basestring), "colName should be a string"
--- End diff --
I think `TypeError` with an if could be more correct.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86649/testReport)** for PR 20390 at commit [`54a26ce`](https://github.com/apache/spark/commit/54a26cef68d1252003f19dc762c39f16048326e6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163933411
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
--- End diff --
-> `2.3`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86649/testReport)** for PR 20390 at commit [`54a26ce`](https://github.com/apache/spark/commit/54a26cef68d1252003f19dc762c39f16048326e6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163940181
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
--- End diff --
@gatorsmile Thanks for your comments. I will make the changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163755876
--- Diff: python/pyspark/sql/tests.py ---
@@ -2855,6 +2855,10 @@ def test_create_dataframe_from_old_pandas(self):
with self.assertRaisesRegexp(ImportError, 'Pandas >= .* must be installed'):
self.spark.createDataFrame(pdf)
+ def test_colRegex(self):
+ df = self.spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)])
+ self.assertEqual(df.select(df.colRegex("`(_1)?+.+`")).collect(), df.select("_2").collect())
--- End diff --
I think this is actually being tested in doctest. Seems we can remove out.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86626/testReport)** for PR 20390 at commit [`92ee53a`](https://github.com/apache/spark/commit/92ee53a8af720cb107dd0da7e1ea6eaaf32f0c06).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/210/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86611/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86649/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86626/testReport)** for PR 20390 at commit [`92ee53a`](https://github.com/apache/spark/commit/92ee53a8af720cb107dd0da7e1ea6eaaf32f0c06).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/240/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163721761
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,15 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.3)
--- End diff --
I think this should be 2.4.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163925810
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
+
+ :param colName: string, column name specified as a regex.
+
+ >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)])
+ >>> df.select(df.colRegex("`(_1)?+.+`")).show()
--- End diff --
@felixcheung Thanks for your comment! I will make changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/242/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163731132
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,15 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.3)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
+ """
+ jc = self._jdf.colRegex(colName)
--- End diff --
@HyukjinKwon Thank you very much for your comments. I will submit changes soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20390
Merged to master and branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163759921
--- Diff: python/pyspark/sql/tests.py ---
@@ -2855,6 +2855,10 @@ def test_create_dataframe_from_old_pandas(self):
with self.assertRaisesRegexp(ImportError, 'Pandas >= .* must be installed'):
self.spark.createDataFrame(pdf)
+ def test_colRegex(self):
+ df = self.spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)])
+ self.assertEqual(df.select(df.colRegex("`(_1)?+.+`")).collect(), df.select("_2").collect())
--- End diff --
@HyukjinKwon Thanks! I will make the changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163791518
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -819,6 +819,29 @@ def columns(self):
"""
return [f.name for f in self.schema.fields]
+ @since(2.4)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
+
+ :param colName: string, column name specified as a regex.
+
+ >>> df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)])
+ >>> df.select(df.colRegex("`(_1)?+.+`")).show()
--- End diff --
nit: perhaps a bit obscure to pick the default column name of `_1`?
how about we name the columns in the line above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20390#discussion_r163721837
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1881,6 +1881,15 @@ def toDF(self, *cols):
jdf = self._jdf.toDF(self._jseq(cols))
return DataFrame(jdf, self.sql_ctx)
+ @since(2.3)
+ def colRegex(self, colName):
+ """
+ Selects column based on the column name specified as a regex and return it
+ as :class:`Column`.
--- End diff --
Shall we add a doctest and `:param ` too while we are here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20390
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by huaxingao <gi...@git.apache.org>.
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/20390
Thank you all for your help! @HyukjinKwon @gatorsmile @felixcheung
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20390: [SPARK-23081][PYTHON]Add colRegex API to PySpark
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20390
**[Test build #86611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86611/testReport)** for PR 20390 at commit [`d08ed6b`](https://github.com/apache/spark/commit/d08ed6b48bb6d9bdb464f886fbce7936f7ecf7e7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org