You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/05/25 03:44:46 UTC

[GitHub] [spark] itholic opened a new pull request, #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

itholic opened a new pull request, #41305:
URL: https://github.com/apache/spark/pull/41305

   ### What changes were proposed in this pull request?
   
   This PR proposes to fix `BinaryOps` test for pandas API on Spark with Spark Connect.
   
   This including SPARK-43666, SPARK-43667, SPARK-43668, SPARK-43669 at once, because they are all related similar modifications in single file.
   
   
   ### Why are the changes needed?
   
   To support all features for pandas API on Spark with Spark Connect.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, `BinaryOps.lt`,  `BinaryOps.le`, `BinaryOps.ge`, `BinaryOps.gt` are now working as expected on Spark Connect.
   
   
   ### How was this patch tested?
   
   Uncomment the UTs, and tested manually.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #41305:
URL: https://github.com/apache/spark/pull/41305#discussion_r1205116949


##########
python/pyspark/pandas/data_type_ops/binary_ops.py:
##########
@@ -70,27 +71,49 @@ def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         from pyspark.pandas.base import column_op
 
         _sanitize_list_like(right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
 
-        return column_op(Column.__lt__)(left, right)
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Can we have one utility to get the column class at least? e.g., `pyspark.sql.utils.pyspark_column_op("__lt__")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #41305:
URL: https://github.com/apache/spark/pull/41305#discussion_r1205195934


##########
python/pyspark/sql/utils.py:
##########
@@ -234,3 +236,19 @@ def wrapped(*args: Any, **kwargs: Any) -> Any:
         return f(*args, **kwargs)
 
     return cast(FuncT, wrapped)
+
+
+def pyspark_column_op(func_name: str) -> Callable[..., "SeriesOrIndex"]:
+    """
+    Wrapper function for column_op to get proper Column class.
+    """
+    from pyspark.pandas.base import column_op
+    from pyspark.sql.column import Column as PySparkColumn
+
+    if is_remote():
+        from pyspark.sql.connect.column import Column as ConnectColumn
+
+        Column = ConnectColumn
+    else:
+        Column = PySparkColumn  # type: ignore[assignment]
+    return column_op(getattr(Column, func_name))

Review Comment:
   > Can we get a one utility to get the column class at least? e.g., utils.pyspark_column_op("__lt__")
   @HyukjinKwon just added util function here. If it looks good, will apply this change to other PRs as well.



##########
python/pyspark/sql/utils.py:
##########
@@ -234,3 +236,19 @@ def wrapped(*args: Any, **kwargs: Any) -> Any:
         return f(*args, **kwargs)
 
     return cast(FuncT, wrapped)
+
+
+def pyspark_column_op(func_name: str) -> Callable[..., "SeriesOrIndex"]:
+    """
+    Wrapper function for column_op to get proper Column class.
+    """
+    from pyspark.pandas.base import column_op
+    from pyspark.sql.column import Column as PySparkColumn
+
+    if is_remote():
+        from pyspark.sql.connect.column import Column as ConnectColumn
+
+        Column = ConnectColumn
+    else:
+        Column = PySparkColumn  # type: ignore[assignment]
+    return column_op(getattr(Column, func_name))

Review Comment:
   > Can we get a one utility to get the column class at least? e.g., utils.pyspark_column_op("__lt__")
   
   @HyukjinKwon just added util function here. If it looks good, will apply this change to other PRs as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #41305:
URL: https://github.com/apache/spark/pull/41305#discussion_r1205116949


##########
python/pyspark/pandas/data_type_ops/binary_ops.py:
##########
@@ -70,27 +71,49 @@ def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         from pyspark.pandas.base import column_op
 
         _sanitize_list_like(right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
 
-        return column_op(Column.__lt__)(left, right)
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Can we get a one utility to get the column class at least? e.g., `utils.pyspark_column_op("__lt__")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #41305:
URL: https://github.com/apache/spark/pull/41305#issuecomment-1566324730

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #41305:
URL: https://github.com/apache/spark/pull/41305#discussion_r1205116949


##########
python/pyspark/pandas/data_type_ops/binary_ops.py:
##########
@@ -70,27 +71,49 @@ def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         from pyspark.pandas.base import column_op
 
         _sanitize_list_like(right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
 
-        return column_op(Column.__lt__)(left, right)
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Can we get a one utility to get the column class at least? e.g., `pyspark.sql.utils.pyspark_column_op("__lt__")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #41305: [SPARK-43666][SPARK-43667][SPARK-43668][SPARK-43669][PS] Fix `BinaryOps` for Spark Connect
URL: https://github.com/apache/spark/pull/41305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org