You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/04/10 06:21:59 UTC

[GitHub] [spark] itholic opened a new pull request, #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

itholic opened a new pull request, #40722:
URL: https://github.com/apache/spark/pull/40722

   ### What changes were proposed in this pull request?
   
   This PR proposes to remove the dependency on `grpcio` when remote session is not used for pandas API on Spark.
   
   ### Why are the changes needed?
   
   Pandas API on spark should be able to run without `grpcio` when using regular Spark session.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No API change, but now users can use pandas API on Spark without installing any additional package when using regular Spark session.
   
   
   ### How was this patch tested?
   
   Manually test without installing `grpcio`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1170974127


##########
python/pyspark/pandas/indexing.py:
##########
@@ -1807,7 +1837,13 @@ def _select_cols_else(
             )
 
     def __setitem__(self, key: Any, value: Any) -> None:
-        if not isinstance(value, (Column, ConnectColumn)) and is_list_like(value):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if not isinstance(value, Column) and is_list_like(value):

Review Comment:
   I think you can do the same trick with try-except



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1171092506


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   It complains `UnboundLocalError: local variable 'Column' referenced before assignment` if we don't explicitly assign the `PySparkColumn` to the variable here, so I fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1171970741


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Oh, I missed that it's used only for type hint



##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Just updated, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1514231816

   CI passed. @HyukjinKwon Could you take a look when you find some time??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168327449


##########
python/pyspark/pandas/internal.py:
##########
@@ -987,34 +986,52 @@ def attach_distributed_sequence_column(
                     [], schema=StructType().add(column_name, data_type=LongType(), nullable=False)
                 )
 
-    def spark_column_for(self, label: Label) -> Column:
+    def spark_column_for(self, label: Label) -> PySparkColumn:
         """Return Spark Column for the given column label."""
         column_labels_to_scol = dict(zip(self.column_labels, self.data_spark_columns))
         if label in column_labels_to_scol:
             return column_labels_to_scol[label]
         else:
             raise KeyError(name_like_string(label))
 
-    def spark_column_name_for(self, label_or_scol: Union[Label, Column]) -> str:
+    def spark_column_name_for(self, label_or_scol: Union[Label, PySparkColumn]) -> str:
         """Return the actual Spark column name for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):

Review Comment:
   I tried to switch the condition to `Label`, but it complains `TypeError: Subscripted generics cannot be used with class and instance checks` since `Label` is not a comparable class within `isinstance`. Let me leave it as it is for now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161531345


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -383,13 +444,24 @@ def xor(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
 
         if _is_boolean_type(right):
 
-            def xor_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, (Column, ConnectColumn)):
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def xor_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+                if not isinstance(right, Column):

Review Comment:
   remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161534944


##########
python/pyspark/pandas/frame.py:
##########
@@ -8536,10 +8550,17 @@ def rename(col: str) -> str:
         data_columns = []
         column_labels = []
 
-        def left_scol_for(label: Label) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def left_scol_for(label: Label) -> Column:  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -10221,7 +10242,13 @@ def _reindex_columns(
                     "shape (1,{}) doesn't match the shape (1,{})".format(len(col), level)
                 )
         fill_value = np.nan if fill_value is None else fill_value
-        scols_or_pssers: List[Union[GenericColumn, "Series"]] = []
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        scols_or_pssers: List[Union[Column, "Series"]] = []  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1521051461

   @HyukjinKwon Test passed. There is anything else we should address here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Yes, I believe they works as the same, because
   
   - `GenericDataFrame.__args__` was returning `(<class 'pyspark.sql.dataframe.DataFrame'>, <class 'pyspark.sql.connect.dataframe.DataFrame'>)`
   - `SparkDataFrame` now indicates `pyspark.sql.dataframe.DataFrame` or `pyspark.sql.connect.dataframe.DataFrame`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161540196


##########
python/pyspark/pandas/utils.py:
##########
@@ -946,25 +950,20 @@ def spark_column_equals(left: Column, right: Column) -> bool:
     >>> spark_column_equals(sdf1["x"] + 1, sdf2["x"] + 1)
     False
     """
-    if isinstance(left, Column):
-        return left._jc.equals(right._jc)
-    elif isinstance(left, ConnectColumn):
+    if is_remote():

Review Comment:
   I guess we need to check the class here ? and throw a proper error message?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1501449301

   cc @HyukjinKwon @bjornjorgensen FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1170976266


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Are you saying that
   
   `assert isinstance(spark_frame, SparkDataFrame)` 
   
   and 
   
   `assert isinstance(spark_frame, GenericDataFrame.__args__)`
   
   are same? `__args__` is the instance from typing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168280018


##########
python/pyspark/pandas/internal.py:
##########
@@ -987,34 +986,52 @@ def attach_distributed_sequence_column(
                     [], schema=StructType().add(column_name, data_type=LongType(), nullable=False)
                 )
 
-    def spark_column_for(self, label: Label) -> Column:
+    def spark_column_for(self, label: Label) -> PySparkColumn:
         """Return Spark Column for the given column label."""
         column_labels_to_scol = dict(zip(self.column_labels, self.data_spark_columns))
         if label in column_labels_to_scol:
             return column_labels_to_scol[label]
         else:
             raise KeyError(name_like_string(label))
 
-    def spark_column_name_for(self, label_or_scol: Union[Label, Column]) -> str:
+    def spark_column_name_for(self, label_or_scol: Union[Label, PySparkColumn]) -> str:
         """Return the actual Spark column name for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Yes, I believe they works as the same, because `GenericDataFrame.__args__` was returning `(<class 'pyspark.sql.dataframe.DataFrame'>, <class 'pyspark.sql.connect.dataframe.DataFrame'>)`, and `SparkDataFrame` now indicates `pyspark.sql.dataframe.DataFrame` or `pyspark.sql.connect.dataframe.DataFrame`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161537598


##########
python/pyspark/pandas/internal.py:
##########
@@ -987,34 +986,52 @@ def attach_distributed_sequence_column(
                     [], schema=StructType().add(column_name, data_type=LongType(), nullable=False)
                 )
 
-    def spark_column_for(self, label: Label) -> Column:
+    def spark_column_for(self, label: Label) -> PySparkColumn:
         """Return Spark Column for the given column label."""
         column_labels_to_scol = dict(zip(self.column_labels, self.data_spark_columns))
         if label in column_labels_to_scol:
             return column_labels_to_scol[label]
         else:
             raise KeyError(name_like_string(label))
 
-    def spark_column_name_for(self, label_or_scol: Union[Label, Column]) -> str:
+    def spark_column_name_for(self, label_or_scol: Union[Label, PySparkColumn]) -> str:
         """Return the actual Spark column name for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):

Review Comment:
   Can you switch the condition to `Label`? then you can remove if-else



##########
python/pyspark/pandas/internal.py:
##########
@@ -987,34 +986,52 @@ def attach_distributed_sequence_column(
                     [], schema=StructType().add(column_name, data_type=LongType(), nullable=False)
                 )
 
-    def spark_column_for(self, label: Label) -> Column:
+    def spark_column_for(self, label: Label) -> PySparkColumn:
         """Return Spark Column for the given column label."""
         column_labels_to_scol = dict(zip(self.column_labels, self.data_spark_columns))
         if label in column_labels_to_scol:
             return column_labels_to_scol[label]
         else:
             raise KeyError(name_like_string(label))
 
-    def spark_column_name_for(self, label_or_scol: Union[Label, Column]) -> str:
+    def spark_column_name_for(self, label_or_scol: Union[Label, PySparkColumn]) -> str:
         """Return the actual Spark column name for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):
             return self.spark_frame.select(label_or_scol).columns[0]
         else:
-            return self.field_for(label_or_scol).name
+            return self.field_for(label_or_scol).name  # type: ignore[arg-type]
 
-    def spark_type_for(self, label_or_scol: Union[Label, Column]) -> DataType:
+    def spark_type_for(self, label_or_scol: Union[Label, PySparkColumn]) -> DataType:
         """Return DataType for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161532792


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -300,9 +396,16 @@ def rfloordiv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Floor division can not be applied to given types.")
 
-        def rfloordiv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rfloordiv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -334,18 +437,33 @@ def mul(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             raise TypeError("Multiplication can not be applied to given types.")
 
         right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
-        Column: Type[GenericColumn] = ConnectColumn if is_remote() else PySparkColumn
-        return column_op(cast(Callable[..., GenericColumn], Column.__mul__))(left, right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        return column_op(Column.__mul__)(left, right)  # type: ignore[arg-type]
 
     def truediv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("True division can not be applied to given types.")
 
-        def truediv(left: GenericColumn, right: Any) -> GenericColumn:
-            return F.when(F.lit(right != 0) | F.lit(right).isNull(), left.__div__(right)).otherwise(
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def truediv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -357,14 +475,21 @@ def floordiv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Floor division can not be applied to given types.")
 
-        def floordiv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def floordiv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161533016


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -377,12 +502,17 @@ def rtruediv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("True division can not be applied to given types.")
 
-        def rtruediv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rtruediv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -392,11 +522,16 @@ def rfloordiv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Floor division can not be applied to given types.")
 
-        def rfloordiv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rfloordiv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] bjornjorgensen commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "bjornjorgensen (via GitHub)" <gi...@apache.org>.

bjornjorgensen commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1523008330

   @itholic Thank you. Great work. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168242912


##########
python/pyspark/pandas/data_type_ops/date_ops.py:
##########
@@ -80,35 +76,33 @@ def rsub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         )
         if isinstance(right, datetime.date) and not isinstance(right, datetime.datetime):
             warnings.warn(msg, UserWarning)
-            return -column_op(cast(Callable[..., GenericColumn], F.datediff))(
-                left, F.lit(right)
-            ).astype("long")
+            return -column_op(F.datediff)(left, F.lit(right)).astype("long")
         else:
             raise TypeError("Date subtraction can only be applied to date series.")
 
     def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         from pyspark.pandas.base import column_op
 
         _sanitize_list_like(right)
-        return column_op(cast(Callable[..., GenericColumn], Column.__lt__))(left, right)
+        return column_op(Column.__lt__)(left, right)

Review Comment:
   Seems like we should branch `ConnectColumn` and `PySparkColumn`. Let me address it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168285648


##########
python/pyspark/pandas/utils.py:
##########
@@ -919,7 +917,13 @@ def verify_temp_column_name(
         )
         column_name = column_name_or_label
 
-    assert isinstance(df, (PySparkDataFrame, ConnectDataFrame)), type(df)
+    if is_remote():
+        from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
+
+        SparkDataFrame = ConnectDataFrame
+    else:
+        SparkDataFrame = PySparkDataFrame  # type: ignore[assignment]

Review Comment:
   I think we need this because `assert isinstance(df, SparkDataFrame)` would fail when `is_remote` returns `False`? Because `SparkDataFrame` will not be defined in that case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon closed pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.
URL: https://github.com/apache/spark/pull/40722


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1523798861

   @itholic can you double check this https://github.com/apache/spark/pull/40722#discussion_r1170976266?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Yes, I believe they works as the same, because
   
   - `GenericDataFrame.__args__` was returning `(<class 'pyspark.sql.dataframe.DataFrame'>, <class 'pyspark.sql.connect.dataframe.DataFrame'>)`
   - `SparkDataFrame` now indicates `pyspark.sql.dataframe.DataFrame` or `pyspark.sql.connect.dataframe.DataFrame`
   
   Please let me know if I missed something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168276449


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   I think the code is the same with previous assert? Because `assert isinstance(spark_frame, SparkDataFrame)` below check the if `spark_frame` is a `ConnectDataFrame` or a `PySparkDataFrame` which is the same as `assert isinstance(spark_frame, GenericDataFrame.__args__)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168282551


##########
python/pyspark/pandas/numpy_compat.py:
##########
@@ -223,11 +223,16 @@ def maybe_dispatch_ufunc_to_spark_func(
             op_name
         )
 
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
         @no_type_check
         def convert_arguments(*args):
-            args = [
-                F.lit(inp) if not isinstance(inp, (Column, ConnectColumn)) else inp for inp in args
-            ]
+            args = [F.lit(inp) if not isinstance(inp, Column) else inp for inp in args]

Review Comment:
   Yes, we can



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161530190


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -258,14 +263,25 @@ def xor(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             return right ^ left
         elif _is_valid_for_logical_operator(right):
 
-            def xor_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, (Column, ConnectColumn)):
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def xor_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+                if not isinstance(right, Column):

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161537761


##########
python/pyspark/pandas/internal.py:
##########
@@ -987,34 +986,52 @@ def attach_distributed_sequence_column(
                     [], schema=StructType().add(column_name, data_type=LongType(), nullable=False)
                 )
 
-    def spark_column_for(self, label: Label) -> Column:
+    def spark_column_for(self, label: Label) -> PySparkColumn:
         """Return Spark Column for the given column label."""
         column_labels_to_scol = dict(zip(self.column_labels, self.data_spark_columns))
         if label in column_labels_to_scol:
             return column_labels_to_scol[label]
         else:
             raise KeyError(name_like_string(label))
 
-    def spark_column_name_for(self, label_or_scol: Union[Label, Column]) -> str:
+    def spark_column_name_for(self, label_or_scol: Union[Label, PySparkColumn]) -> str:
         """Return the actual Spark column name for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):
             return self.spark_frame.select(label_or_scol).columns[0]
         else:
-            return self.field_for(label_or_scol).name
+            return self.field_for(label_or_scol).name  # type: ignore[arg-type]
 
-    def spark_type_for(self, label_or_scol: Union[Label, Column]) -> DataType:
+    def spark_type_for(self, label_or_scol: Union[Label, PySparkColumn]) -> DataType:
         """Return DataType for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if isinstance(label_or_scol, Column):
             return self.spark_frame.select(label_or_scol).schema[0].dataType
         else:
-            return self.field_for(label_or_scol).spark_type
+            return self.field_for(label_or_scol).spark_type  # type: ignore[arg-type]
 
-    def spark_column_nullable_for(self, label_or_scol: Union[Label, Column]) -> bool:
+    def spark_column_nullable_for(self, label_or_scol: Union[Label, PySparkColumn]) -> bool:
         """Return nullability for the given column label."""
-        if isinstance(label_or_scol, (Column, ConnectColumn)):
+        if is_remote():

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161532667


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -240,24 +313,37 @@ def pretty_name(self) -> str:
     def mul(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
         if isinstance(right, IndexOpsMixin) and isinstance(right.spark.data_type, StringType):
-            return column_op(cast(Callable[..., GenericColumn], SF.repeat))(right, left)
+            return column_op(SF.repeat)(right, left)
 
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Multiplication can not be applied to given types.")
 
         right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
-        Column: Type[GenericColumn] = ConnectColumn if is_remote() else PySparkColumn
-        return column_op(Column.__mul__)(left, right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        return column_op(Column.__mul__)(left, right)  # type: ignore[arg-type]
 
     def truediv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("True division can not be applied to given types.")
 
-        def truediv(left: GenericColumn, right: Any) -> GenericColumn:
-            return F.when(F.lit(right != 0) | F.lit(right).isNull(), left.__div__(right)).otherwise(
-                F.lit(np.inf).__div__(left)  # type: ignore[arg-type]
-            )
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def truediv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -267,14 +353,19 @@ def floordiv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Floor division can not be applied to given types.")
 
-        def floordiv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def floordiv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -285,12 +376,17 @@ def rtruediv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("True division can not be applied to given types.")
 
-        def rtruediv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rtruediv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161533297


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -392,11 +522,16 @@ def rfloordiv(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Floor division can not be applied to given types.")
 
-        def rfloordiv(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rfloordiv(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   Just use `PySparkColumn` for type hints



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -494,13 +629,18 @@ def rpow(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Exponentiation can not be applied to given types.")
 
-        Column = ConnectColumn if is_remote() else PySparkColumn
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
 
-        def rpow_func(left: GenericColumn, right: Any) -> GenericColumn:
+        def rpow_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   Column



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161530958


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -277,11 +293,18 @@ def __or__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             return right.__or__(left)
         else:
 
-            def or_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, (Column, ConnectColumn)) and pd.isna(right):
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def or_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+                if not isinstance(right, Column) and pd.isna(right):

Review Comment:
   ditto, I believe you can remove `not isinstance(right, Column) and`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161531771


##########
python/pyspark/pandas/data_type_ops/date_ops.py:
##########
@@ -80,35 +76,33 @@ def rsub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         )
         if isinstance(right, datetime.date) and not isinstance(right, datetime.datetime):
             warnings.warn(msg, UserWarning)
-            return -column_op(cast(Callable[..., GenericColumn], F.datediff))(
-                left, F.lit(right)
-            ).astype("long")
+            return -column_op(F.datediff)(left, F.lit(right)).astype("long")
         else:
             raise TypeError("Date subtraction can only be applied to date series.")
 
     def lt(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         from pyspark.pandas.base import column_op
 
         _sanitize_list_like(right)
-        return column_op(cast(Callable[..., GenericColumn], Column.__lt__))(left, right)
+        return column_op(Column.__lt__)(left, right)

Review Comment:
   This seems like always using non Spark Column. Is this correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161533356


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -494,13 +629,18 @@ def rpow(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Exponentiation can not be applied to given types.")
 
-        Column = ConnectColumn if is_remote() else PySparkColumn
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
 
-        def rpow_func(left: GenericColumn, right: Any) -> GenericColumn:
+        def rpow_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161536342


##########
python/pyspark/pandas/indexing.py:
##########
@@ -1807,7 +1837,13 @@ def _select_cols_else(
             )
 
     def __setitem__(self, key: Any, value: Any) -> None:
-        if not isinstance(value, (Column, ConnectColumn)) and is_list_like(value):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if not isinstance(value, Column) and is_list_like(value):

Review Comment:
   I think you can remove
   
   ```suggestion
           if not is_list_like(value):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161525789


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -241,8 +239,15 @@ def __and__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             return right.__and__(left)
         else:
 
-            def and_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, GenericColumn.__args__):  # type: ignore[attr-defined]
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def and_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   And cast to `Column` in mypy syntax.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161539261


##########
python/pyspark/pandas/utils.py:
##########
@@ -919,7 +917,13 @@ def verify_temp_column_name(
         )
         column_name = column_name_or_label
 
-    assert isinstance(df, (PySparkDataFrame, ConnectDataFrame)), type(df)
+    if is_remote():
+        from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
+
+        SparkDataFrame = ConnectDataFrame
+    else:
+        SparkDataFrame = PySparkDataFrame  # type: ignore[assignment]

Review Comment:
   ```suggestion
   ```
   
   I guess you don't need this `else` branch? here and everywhere else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168259355


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -110,13 +126,18 @@ def pow(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Exponentiation can not be applied to given types.")
 
-        Column = ConnectColumn if is_remote() else PySparkColumn
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
 
-        def pow_func(left: GenericColumn, right: Any) -> GenericColumn:
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def pow_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   This is exception since `Column` is used in function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1171106583


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Can you use `PySparkColumn` for type hint instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1170970883


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   Can we remove this, and just use `PySparkColumn`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161530080


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -241,8 +239,15 @@ def __and__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             return right.__and__(left)
         else:
 
-            def and_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, GenericColumn.__args__):  # type: ignore[attr-defined]
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def and_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+                if not isinstance(right, Column):

Review Comment:
   `F.lit(right)` actually returns the column as is. So I believe we can remove this condition, and if-else above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161535283


##########
python/pyspark/pandas/frame.py:
##########
@@ -12250,7 +12304,14 @@ def quantile(
             if v < 0.0 or v > 1.0:
                 raise ValueError("percentiles should all be in the interval [0, 1].")
 
-        def quantile(psser: "Series") -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def quantile(psser: "Series") -> Column:  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -12759,7 +12822,16 @@ def mad(self, axis: Axis = 0) -> "Series":
 
         if axis == 0:
 
-            def get_spark_column(psdf: DataFrame, label: Label) -> GenericColumn:
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def get_spark_column(
+                psdf: DataFrame, label: Label
+            ) -> Column:  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -12896,7 +12966,13 @@ def mode(self, axis: Axis = 0, numeric_only: bool = False, dropna: bool = True)
         if numeric_only is None and axis == 0:
             numeric_only = True
 
-        mode_scols: List[GenericColumn] = []
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        mode_scols: List[Column] = []  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161532395


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -166,7 +205,14 @@ def rmod(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not isinstance(right, numbers.Number):
             raise TypeError("Modulo can not be applied to given types.")
 
-        def rmod(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def rmod(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -215,18 +281,25 @@ def xor(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         elif _is_valid_for_logical_operator(right):
             right_is_boolean = _is_boolean_type(right)
 
-            Column = ConnectColumn if is_remote() else PySparkColumn
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
 
-            def xor_func(left: GenericColumn, right: Any) -> GenericColumn:
+            def xor_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161536685


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Is this mistake? new code doesn't look matching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168230741


##########
python/pyspark/pandas/data_type_ops/base.py:
##########
@@ -474,16 +473,26 @@ def eq(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         else:
             from pyspark.pandas.base import column_op
 
-            Column = ConnectColumn if is_remote() else PySparkColumn
-            return column_op(cast(Callable[..., GenericColumn], Column.__eq__))(left, right)
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+            return column_op(Column.__eq__)(left, right)  # type: ignore[arg-type]

Review Comment:
   Sure. Just created ticket for address this one: SPARK-43159



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168287486


##########
python/pyspark/pandas/utils.py:
##########
@@ -946,25 +950,20 @@ def spark_column_equals(left: Column, right: Column) -> bool:
     >>> spark_column_equals(sdf1["x"] + 1, sdf2["x"] + 1)
     False
     """
-    if isinstance(left, Column):
-        return left._jc.equals(right._jc)
-    elif isinstance(left, ConnectColumn):
+    if is_remote():

Review Comment:
   Yes, we should. Let me add the condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1171092506


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   It complains `UnboundLocalError: local variable 'Column' referenced before assignment` if we don't explicitly assign the `PySparkColumn` to the `Column` here, so I fixed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820


##########
python/pyspark/pandas/internal.py:
##########
@@ -628,7 +624,13 @@ def __init__(
         >>> internal.column_label_names
         [('column_labels_a',), ('column_labels_b',)]
         """
-        assert isinstance(spark_frame, GenericDataFrame.__args__)  # type: ignore[attr-defined]

Review Comment:
   Yes, I believe they works as the same, because
   
   - `GenericDataFrame.__args__` was returning `(<class 'pyspark.sql.dataframe.DataFrame'>, <class 'pyspark.sql.connect.dataframe.DataFrame'>)`
   - `SparkDataFrame` now indicates `pyspark.sql.dataframe.DataFrame` or `pyspark.sql.connect.dataframe.DataFrame`
   
   so both can check the same condition. Please let me know if I missed something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #40722:
URL: https://github.com/apache/spark/pull/40722#issuecomment-1522850607

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161535121


##########
python/pyspark/pandas/frame.py:
##########
@@ -10650,7 +10677,15 @@ def stack(self) -> DataFrameOrSeries:
                 ).with_filter(F.lit(False))
             )
 
-        column_labels: Dict[Label, Dict[Any, GenericColumn]] = defaultdict(dict)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        column_labels: Dict[Label, Dict[Any, Column]] = defaultdict(  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -10956,7 +10991,13 @@ def all(
         if len(column_labels) == 0:
             return ps.Series([], dtype=bool)
 
-        applied: List[GenericColumn] = []
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        applied: List[Column] = []  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -11039,7 +11080,13 @@ def any(self, axis: Axis = 0, bool_only: Optional[bool] = None) -> "Series":
         if len(column_labels) == 0:
             return ps.Series([], dtype=bool)
 
-        applied: List[GenericColumn] = []
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        applied: List[Column] = []  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



##########
python/pyspark/pandas/frame.py:
##########
@@ -11890,7 +11937,14 @@ def pct_change(self, periods: int = 1) -> "DataFrame":
         """
         window = Window.orderBy(NATURAL_ORDER_COLUMN_NAME).rowsBetween(-periods, -periods)
 
-        def op(psser: ps.Series) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def op(psser: ps.Series) -> Column:  # type: ignore[valid-type]

Review Comment:
   just use PySparkColumn



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161538588


##########
python/pyspark/pandas/numpy_compat.py:
##########
@@ -223,11 +223,16 @@ def maybe_dispatch_ufunc_to_spark_func(
             op_name
         )
 
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
         @no_type_check
         def convert_arguments(*args):
-            args = [
-                F.lit(inp) if not isinstance(inp, (Column, ConnectColumn)) else inp for inp in args
-            ]
+            args = [F.lit(inp) if not isinstance(inp, Column) else inp for inp in args]

Review Comment:
   This one, I actually don't think we should check Column condition because `lit` returns `Column` as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161534725


##########
python/pyspark/pandas/frame.py:
##########
@@ -5561,8 +5567,14 @@ def _assign(self, kwargs: Any) -> "DataFrame":
             tuple(list(label) + ([""] * (level - len(label)))) for label in column_labels
         ]
 
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
         internal = self._internal.with_new_columns(
-            cast(Sequence[Union[GenericColumn, "Series"]], scols),
+            cast(Sequence[Union[Column, "Series"]], scols),  # type: ignore[valid-type]

Review Comment:
   ditto. just use `PySparkColumn`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1168273396


##########
python/pyspark/pandas/indexing.py:
##########
@@ -1807,7 +1837,13 @@ def _select_cols_else(
             )
 
     def __setitem__(self, key: Any, value: Any) -> None:
-        if not isinstance(value, (Column, ConnectColumn)) and is_list_like(value):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        if not isinstance(value, Column) and is_list_like(value):

Review Comment:
   If we remove `isinstance(value, Column)` then `is_list_like(value)` raise `PySparkValueError` when the value is `Column` as below ??
   
   e.g.
   ```python
   >>> is_list_like(scol)
   Traceback (most recent call last):
   ...
   pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_CONVERT_COLUMN_INTO_BOOL] Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161531972


##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -82,24 +81,41 @@ def add(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             raise TypeError("Addition can not be applied to given types.")
 
         right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
-        Column: Type[GenericColumn] = ConnectColumn if is_remote() else PySparkColumn
-        return column_op(Column.__add__)(left, right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        return column_op(Column.__add__)(left, right)  # type: ignore[arg-type]
 
     def sub(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Subtraction can not be applied to given types.")
 
         right = transform_boolean_operand_to_numeric(right, spark_type=left.spark.data_type)
-        Column: Type[GenericColumn] = ConnectColumn if is_remote() else PySparkColumn
-        return column_op(Column.__sub__)(left, right)
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+        return column_op(Column.__sub__)(left, right)  # type: ignore[arg-type]
 
     def mod(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Modulo can not be applied to given types.")
 
-        def mod(left: GenericColumn, right: Any) -> GenericColumn:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def mod(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   If this is only for type hint, you can simply cast to `Column` without if-else.



##########
python/pyspark/pandas/data_type_ops/num_ops.py:
##########
@@ -110,13 +126,18 @@ def pow(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         if not is_valid_operand_for_numeric_arithmetic(right):
             raise TypeError("Exponentiation can not be applied to given types.")
 
-        Column = ConnectColumn if is_remote() else PySparkColumn
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
 
-        def pow_func(left: GenericColumn, right: Any) -> GenericColumn:
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def pow_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161531157


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -355,8 +402,15 @@ def pretty_name(self) -> str:
     def __and__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
 
-        def and_func(left: GenericColumn, right: Any) -> GenericColumn:
-            if not isinstance(right, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def and_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+            if not isinstance(right, Column):

Review Comment:
   ditto remove



##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -368,8 +422,15 @@ def and_func(left: GenericColumn, right: Any) -> GenericColumn:
     def __or__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         _sanitize_list_like(right)
 
-        def or_func(left: GenericColumn, right: Any) -> GenericColumn:
-            if not isinstance(right, (Column, ConnectColumn)):
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]
+
+        def or_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+            if not isinstance(right, Column):

Review Comment:
   ditto remove.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161534277


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn
+
+        def op(psser: ps.Series) -> Union[ps.Series, Column]:  # type: ignore[valid-type]

Review Comment:
   ditto just yse pyspark column for type hints



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161525636


##########
python/pyspark/pandas/data_type_ops/boolean_ops.py:
##########
@@ -241,8 +239,15 @@ def __and__(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
             return right.__and__(left)
         else:
 
-            def and_func(left: GenericColumn, right: Any) -> GenericColumn:
-                if not isinstance(right, GenericColumn.__args__):  # type: ignore[attr-defined]
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+
+            def and_func(left: Column, right: Any) -> Column:  # type: ignore[valid-type]
+                if not isinstance(right, Column):

Review Comment:
   
   ```suggestion
                   if not isinstance(right, (ConnectColumn, PySparkColumn)):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1161523256


##########
python/pyspark/pandas/data_type_ops/base.py:
##########
@@ -474,16 +473,26 @@ def eq(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         else:
             from pyspark.pandas.base import column_op
 
-            Column = ConnectColumn if is_remote() else PySparkColumn
-            return column_op(cast(Callable[..., GenericColumn], Column.__eq__))(left, right)
+            if is_remote():
+                from pyspark.sql.connect.column import Column as ConnectColumn
+
+                Column = ConnectColumn
+            else:
+                Column = PySparkColumn  # type: ignore[assignment]
+            return column_op(Column.__eq__)(left, right)  # type: ignore[arg-type]

Review Comment:
   We should remove this away also by using `lambda` instead. Let's fix them all in another PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on code in PR #40722:
URL: https://github.com/apache/spark/pull/40722#discussion_r1171092506


##########
python/pyspark/pandas/frame.py:
##########
@@ -4871,7 +4868,14 @@ def round(self, decimals: Union[int, Dict[Name, int], "Series"] = 0) -> "DataFra
         else:
             raise TypeError("decimals must be an integer, a dict-like or a Series")
 
-        def op(psser: ps.Series) -> Union[ps.Series, GenericColumn]:
+        if is_remote():
+            from pyspark.sql.connect.column import Column as ConnectColumn
+
+            Column = ConnectColumn
+        else:
+            Column = PySparkColumn  # type: ignore[assignment]

Review Comment:
   It complains `UnboundLocalError: local variable 'Column' referenced before assignment` if we don't explicitly assign the `PySparkColumn` to the `Column` here, so I fixed it. (e.g. https://github.com/itholic/spark/runs/12851793505)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org