You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/27 23:12:51 UTC

[GitHub] [spark] techaddict opened a new pull request, #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

techaddict opened a new pull request, #39249:
URL: https://github.com/apache/spark/pull/39249

   ### What changes were proposed in this pull request?
   This PR proposes to enable doctests in pyspark.sql.connect.column that is virtually the same as pyspark.sql.column.
   
   ### Why are the changes needed?
   To make sure on the PySpark compatibility and test coverage.
   
   ### Does this PR introduce any user-facing change?
   No, doctest's only.
   
   ### How was this patch tested?
   New Doctests Added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column
URL: https://github.com/apache/spark/pull/39249


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687601


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__

Review Comment:
   nope, Error was Similar Error in all of these
   ```[UNRESOLVED_ROUTINE] Cannot resolve function `bitwiseAND` on search path [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058686880


##########
python/pyspark/sql/column.py:
##########
@@ -1258,8 +1258,7 @@ def over(self, window: "WindowSpec") -> "Column":
         >>> from pyspark.sql import Window
         >>> window = Window.partitionBy("name").orderBy("age") \
                 .rowsBetween(Window.unboundedPreceding, Window.currentRow)
-        >>> from pyspark.sql.functions import rank, min
-        >>> from pyspark.sql.functions import desc
+        >>> from pyspark.sql.functions import rank,min,desc

Review Comment:
   ```suggestion
           >>> from pyspark.sql.functions import rank, min, desc
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058690914


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+        del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+        del pyspark.sql.connect.column.Column.isNotNull.__doc__
+        del pyspark.sql.connect.column.Column.isNull.__doc__
+        del pyspark.sql.connect.column.Column.isin.__doc__
+        # TODO(SPARK-41756): Fix createDataFrame
+        del pyspark.sql.connect.column.Column.getField.__doc__
+        del pyspark.sql.connect.column.Column.getItem.__doc__
+        # TODO(SPARK-41292): Support Window functions

Review Comment:
   Hm, SPARK-41292 is resolved. Was this failed for a different reason?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687254


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__

Review Comment:
   quick question, were they failed because of `bitwiseAND` too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
techaddict commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366657623

   @HyukjinKwon there are a bunch of methods missing, and createDataframe isn't working with array


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058686957


##########
python/pyspark/sql/connect/column.py:
##########
@@ -388,5 +389,62 @@ def __nonzero__(self) -> None:
 
     __bool__ = __nonzero__
 

Review Comment:
   ```suggestion
   
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058658520


##########
python/pyspark/sql/column.py:
##########
@@ -201,16 +201,16 @@ class Column:
 
     Select a column out of a DataFrame
 
-    >>> df.name
+    >>> df.name   # doctest: +SKIP

Review Comment:
   Let's file a JIRA, and add it as s todo, e.g.)
   
   ```python
   # TODO(SPARK-XXXXX): Compatibility of string representation in Column
   class Column:
       ...  
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058296098


##########
python/pyspark/sql/connect/column.py:
##########
@@ -389,3 +389,46 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.

Review Comment:
   Oh, also this:
   ```suggestion
           # Creates a remote Spark session.
           os.environ["SPARK_REMOTE"] = "sc://localhost"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058682621


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+        del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+        del pyspark.sql.connect.column.Column.isNotNull.__doc__
+        del pyspark.sql.connect.column.Column.isNull.__doc__
+        del pyspark.sql.connect.column.Column.isin.__doc__
+        # TODO: Fix createDataFrame

Review Comment:
   Let's file a JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366991142

   LGTM otherwise


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058873474


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support Column.bitwiseAND,bitwiseOR,bitwiseXOR,eqNullSafe,isNotNull,isNull,isin

Review Comment:
   ```suggestion
           # TODO(SPARK-41751): Support Column.bitwiseAND,bitwiseOR,bitwiseXOR,eqNullSafe,isNotNull,
           # isNull,isin
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1367019627

   Thanks for working on this @techaddict 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1366597126

   @techaddict if there are actual issues from the API, please skip the doctest, file a JIRA and add a comment. E.g., see https://github.com/apache/spark/commit/39d36c49fa5052c11705cc9448aaaa1bde3d3f1d#diff-21a5318a86bdd2014a89585e190ddf75cdbd9e3bdd1efe385620b62c7f35447eR336-R341 or https://github.com/apache/spark/commit/39d36c49fa5052c11705cc9448aaaa1bde3d3f1d#diff-3afb582e9035bec211f4e06fc7bf06371d92a745394536287537f0e889092085R390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687939


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__

Review Comment:
   Mind creating a JIRA for all? Feel free to either fix the existing JIRA to list all functions, or create a JIRA for each.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058289316


##########
python/pyspark/sql/connect/column.py:
##########
@@ -389,3 +389,46 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.column.__dict__.copy()

Review Comment:
   ```suggestion
           globs = pyspark.sql.connect.column.__dict__.copy()
   ```
   
   Oops, it was my misake from another PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058693270


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__

Review Comment:
   done updated the JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058687670


##########
python/pyspark/sql/column.py:
##########
@@ -200,17 +200,17 @@ class Column:
     ...      [(2, "Alice"), (5, "Bob")], ["age", "name"])
 
     Select a column out of a DataFrame
-
-    >>> df.name
+    # TODO(SPARK-41757): Compatibility of string representation

Review Comment:
   Let's put this comment above right before `class Column`. Otherwise, this comment will be shown in PySpark API reference documentation :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #39249:
URL: https://github.com/apache/spark/pull/39249#issuecomment-1367266442

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

Posted by GitBox <gi...@apache.org>.
techaddict commented on code in PR #39249:
URL: https://github.com/apache/spark/pull/39249#discussion_r1058694818


##########
python/pyspark/sql/connect/column.py:
##########
@@ -390,3 +391,61 @@ def __nonzero__(self) -> None:
 
 
 Column.__doc__ = PySparkColumn.__doc__
+
+
+def _test() -> None:
+    import os
+    import sys
+    import doctest
+    from pyspark.sql import SparkSession as PySparkSession
+    from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
+
+    os.chdir(os.environ["SPARK_HOME"])
+
+    if should_test_connect:
+        import pyspark.sql.connect.column
+
+        globs = pyspark.sql.connect.column.__dict__.copy()
+        # Works around to create a regular Spark session
+        sc = SparkContext("local[4]", "sql.connect.column tests", conf=SparkConf())
+        globs["_spark"] = PySparkSession(sc, options={"spark.app.name": "sql.connect.column tests"})
+
+        # Creates a remote Spark session.
+        os.environ["SPARK_REMOTE"] = "sc://localhost"
+        globs["spark"] = PySparkSession.builder.remote("sc://localhost").getOrCreate()
+
+        # TODO(SPARK-41751): Support bitwiseAND
+        del pyspark.sql.connect.column.Column.bitwiseAND.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseOR.__doc__
+        del pyspark.sql.connect.column.Column.bitwiseXOR.__doc__
+        del pyspark.sql.connect.column.Column.eqNullSafe.__doc__
+        del pyspark.sql.connect.column.Column.isNotNull.__doc__
+        del pyspark.sql.connect.column.Column.isNull.__doc__
+        del pyspark.sql.connect.column.Column.isin.__doc__
+        # TODO(SPARK-41756): Fix createDataFrame
+        del pyspark.sql.connect.column.Column.getField.__doc__
+        del pyspark.sql.connect.column.Column.getItem.__doc__
+        # TODO(SPARK-41292): Support Window functions

Review Comment:
   right it failed let me file a new JIRA
   ```
   Failed example:
       window = Window.partitionBy("name").orderBy("age")                 .rowsBetween(Window.unboundedPreceding, Window.currentRow)
   Exception raised:
       Traceback (most recent call last):
         File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", line 1350, in __run
           exec(compile(example.source, filename, "single",
         File "<doctest pyspark.sql.connect.column.Column.over[1]>", line 1, in <module>
           window = Window.partitionBy("name").orderBy("age")                 .rowsBetween(Window.unboundedPreceding, Window.currentRow)
         File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/utils.py", line 346, in wrapped
           raise NotImplementedError()
       NotImplementedError
       
    ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org