You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/27 23:12:51 UTC
[GitHub] [spark] techaddict opened a new pull request, #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
techaddict opened a new pull request, #39249:
URL: https://github.com/apache/spark/pull/39249
### What changes were proposed in this pull request?
This PR proposes to enable doctests in pyspark.sql.connect.column that is virtually the same as pyspark.sql.column.
### Why are the changes needed?
To make sure on the PySpark compatibility and test coverage.
### Does this PR introduce any user-facing change?
No, doctest's only.
### How was this patch tested?
New Doctests Added
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
URL: https://github.com/apache/spark/pull/39249
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687601
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
Review Comment:
nope, Error was Similar Error in all of these
```[UNRESOLVED_ROUTINE] Cannot resolve function `bitwiseAND` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058686880
##########
python/pyspark/sql/column.py:
##########
@@ -1258,8 +1258,7 @@ def over(self, window: "WindowSpec") -> "Column":
>>> from pyspark.sql import Window
>>> window = Window.partitionBy("name").orderBy("age") \
.rowsBetween(Window.unboundedPreceding, Window.currentRow)
- >>> from pyspark.sql.functions import rank, min
- >>> from pyspark.sql.functions import desc
+ >>> from pyspark.sql.functions import rank,min,desc
Review Comment:
```suggestion
>>> from pyspark.sql.functions import rank, min, desc
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058690914
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+ del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+ del pyspark.sql.connect.column.Column.isNotNull.__doc__
+ del pyspark.sql.connect.column.Column.isNull.__doc__
+ del pyspark.sql.connect.column.Column.isin.__doc__
+ # TODO(SPARK-41756): Fix createDataFrame
+ del pyspark.sql.connect.column.Column.getField.__doc__
+ del pyspark.sql.connect.column.Column.getItem.__doc__
+ # TODO(SPARK-41292): Support Window functions
Review Comment:
Hm, SPARK-41292 is resolved. Was this failed for a different reason?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687254
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
Review Comment:
quick question, were they failed because of `bitwiseAND` too?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366657623
@HyukjinKwon there are a bunch of methods missing, and createDataframe isn't working with array
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058686957
##########
python/pyspark/sql/connect/column.py:
##########
@@ -388,5 +389,62 @@ def __nonzero__(self) -> None:
__bool__ = __nonzero__
Review Comment:
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058658520
##########
python/pyspark/sql/column.py:
##########
@@ -201,16 +201,16 @@ class Column:
Select a column out of a DataFrame
- >>> df.name
+ >>> df.name # doctest: +SKIP
Review Comment:
Let's file a JIRA, and add it as s todo, e.g.)
```python
# TODO(SPARK-XXXXX): Compatibility of string representation in Column
class Column:
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058296098
##########
python/pyspark/sql/connect/column.py:
##########
@@ -389,3 +389,46 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
Review Comment:
Oh, also this:
```suggestion
# Creates a remote Spark session.
os.environ["SPARK_REMOTE"] = "sc://localhost"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058682621
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+ del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+ del pyspark.sql.connect.column.Column.isNotNull.__doc__
+ del pyspark.sql.connect.column.Column.isNull.__doc__
+ del pyspark.sql.connect.column.Column.isin.__doc__
+ # TODO: Fix createDataFrame
Review Comment:
Let's file a JIRA
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366991142
LGTM otherwise
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058873474
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support Column.bitwiseAND,bitwiseOR,bitwiseXOR,eqNullSafe,isNotNull,isNull,isin
Review Comment:
```suggestion
# TODO(SPARK-41751): Support Column.bitwiseAND,bitwiseOR,bitwiseXOR,eqNullSafe,isNotNull,
# isNull,isin
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1367019627
Thanks for working on this @techaddict
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366597126
@techaddict if there are actual issues from the API, please skip the doctest, file a JIRA and add a comment. E.g., see https://github.com/apache/spark/commit/39d36c49fa5052c11705cc9448aaaa1bde3d3f1d#diff-21a5318a86bdd2014a89585e190ddf75cdbd9e3bdd1efe385620b62c7f35447eR336-R341 or https://github.com/apache/spark/commit/39d36c49fa5052c11705cc9448aaaa1bde3d3f1d#diff-3afb582e9035bec211f4e06fc7bf06371d92a745394536287537f0e889092085R390
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687939
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
Review Comment:
Mind creating a JIRA for all? Feel free to either fix the existing JIRA to list all functions, or create a JIRA for each.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058289316
##########
python/pyspark/sql/connect/column.py:
##########
@@ -389,3 +389,46 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.column.__dict__.copy()
Review Comment:
```suggestion
globs = pyspark.sql.connect.column.__dict__.copy()
```
Oops, it was my misake from another PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058693270
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
Review Comment:
done updated the JIRA
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687670
##########
python/pyspark/sql/column.py:
##########
@@ -200,17 +200,17 @@ class Column:
... [(2, "Alice"), (5, "Bob")], ["age", "name"])
Select a column out of a DataFrame
-
- >>> df.name
+ # TODO(SPARK-41757): Compatibility of string representation
Review Comment:
Let's put this comment above right before `class Column`. Otherwise, this comment will be shown in PySpark API reference documentation :-).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1367266442
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058694818
##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+ import os
+ import sys
+ import doctest
+ from pyspark.sql import SparkSession as PySparkSession
+ from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+ os.chdir(os.environ["SPARK_HOME"])
+
+ if should_test_connect:
+ import pyspark.sql.connect.column
+
+ globs = pyspark.sql.connect.column.__dict__.copy()
+ # Works around to create a regular Spark session
+ sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+ globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+ # Creates a remote Spark session.
+ os.environ["SPARK_REMOTE"] = "sc://localhost"
+ globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+ # TODO(SPARK-41751): Support bitwiseAND
+ del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+ del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+ del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+ del pyspark.sql.connect.column.Column.isNotNull.__doc__
+ del pyspark.sql.connect.column.Column.isNull.__doc__
+ del pyspark.sql.connect.column.Column.isin.__doc__
+ # TODO(SPARK-41756): Fix createDataFrame
+ del pyspark.sql.connect.column.Column.getField.__doc__
+ del pyspark.sql.connect.column.Column.getItem.__doc__
+ # TODO(SPARK-41292): Support Window functions
Review Comment:
right it failed let me file a new JIRA
```
Failed example:
window = Window.partitionBy("name").orderBy("age") .rowsBetween(Window.unboundedPreceding, Window.currentRow)
Exception raised:
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run
exec(compile(example.source, filename, "single",
File "<doctest pyspark.sql.connect.column.Column.over[1]>", line 1, in <module>
window = Window.partitionBy("name").orderBy("age") .rowsBetween(Window.unboundedPreceding, Window.currentRow)
File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line 346, in wrapped
raise NotImplementedError()
NotImplementedError
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org