You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/01 05:16:26 UTC

[GitHub] [spark] itholic commented on a change in pull request #34114: [SPARK-36438][PYTHON] Support list-like Python objects for Series comparison

itholic commented on a change in pull request #34114:
URL: https://github.com/apache/spark/pull/34114#discussion_r719942176



##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -349,11 +349,86 @@ def ge(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
         raise TypeError(">= can not be applied to %s." % self.pretty_name)
 
     def eq(self, left: IndexOpsLike, right: Any) -> SeriesOrIndex:
-        from pyspark.pandas.base import column_op
-
-        _sanitize_list_like(right)
+        if isinstance(right, (list, tuple)):
+            from pyspark.pandas.series import first_series, scol_for
+            from pyspark.pandas.frame import DataFrame
+            from pyspark.pandas.internal import NATURAL_ORDER_COLUMN_NAME, InternalField
+
+            sdf = left._internal.spark_frame
+            structed_scol = F.struct(
+                sdf[NATURAL_ORDER_COLUMN_NAME],
+                *left._internal.index_spark_columns,
+                left.spark.column
+            )
+            # The size of the list is expected to be small.
+            collected_structed_scol = F.collect_list(structed_scol)
+            # Sort the array by NATURAL_ORDER_COLUMN so that we can guarantee the order.
+            collected_structed_scol = F.array_sort(collected_structed_scol)
+            right_values_scol = F.array([F.lit(x) for x in right])  # type: ignore
+            index_scol_names = left._internal.index_spark_column_names
+            scol_name = left._internal.spark_column_name_for(left._internal.column_labels[0])
+            # Compare the values of left and right by using zip_with function.
+            cond = F.zip_with(
+                collected_structed_scol,
+                right_values_scol,
+                lambda x, y: F.struct(
+                    *[
+                        x[index_scol_name].alias(index_scol_name)
+                        for index_scol_name in index_scol_names
+                    ],
+                    F.when(
+                        F.assert_true(
+                            # If the comparing result is null,
+                            # that means the length of `left` and `right` is not the same.

Review comment:
       Oh, then they should return `False`. Let me address this case.
   
   ```python
   >>> pd.Series([1, 2, None]) == [1, 2, None]
   0     True
   1     True
   2    False
   dtype: bool
   
   >>> pd.Series([1, 2, None]) == [1, 2, 3]
   0     True
   1     True
   2    False
   dtype: bool
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org