You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/30 00:49:37 UTC

[GitHub] [spark] BryanCutler commented on a change in pull request #33980: [SPARK-32285][PYTHON] Add PySpark support for nested timestamps with arrow

BryanCutler commented on a change in pull request #33980:
URL: https://github.com/apache/spark/pull/33980#discussion_r718964803



##########
File path: python/pyspark/sql/pandas/types.py
##########
@@ -296,7 +337,34 @@ def _check_series_convert_timestamps_localize(s, from_timezone, to_timezone):
         return s
 
 
-def _check_series_convert_timestamps_local_tz(s, timezone):
+def __handle_array_of_timestamps(series, to_tz,  from_tz=None):
+    """
+
+    :param series: Pandas series
+    :param to_tz: to timezone
+    :param from_tz: from time zone
+    :return: return series respecting timezone
+    """
+    from pandas.api.types import is_datetime64tz_dtype, is_datetime64_dtype
+    import pandas as pd
+    from pandas import Series
+    data_after_conversion = []
+    for data in series:

Review comment:
       So this is iterating over each timestamp array in the series, applying conversions each time and then building a new series back from a list? That seems pretty inefficient and looks to be a lot of specialized conversion going on here for just arrays of timestamps.

##########
File path: python/pyspark/sql/pandas/types.py
##########
@@ -253,14 +278,29 @@ def _check_series_convert_timestamps_internal(s, timezone):
         # >>> str(tz.localize(t))
         # '2015-11-01 01:30:00-05:00'
         tz = timezone or _get_local_timezone()
-        return s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
+        data = s.dt.tz_localize(tz, ambiguous=False).dt.tz_convert('UTC')
+        return __modified_series(data, is_array)
     elif is_datetime64tz_dtype(s.dtype):
-        return s.dt.tz_convert('UTC')
+        data = s.dt.tz_convert('UTC')
+        return __modified_series(data, is_array)
     else:
-        return s
+        return __modified_series(s, is_array)
+
+
+def __modified_series(data, is_array):
+    """
+    :param data: Converted data
+    :param is_array: If the input data type is type of array
+    :return: If input type is array ,then return series with array of data
+    else return series as it is.
+    """
+    from pandas import Series
+    if is_array:
+        return Series([data])

Review comment:
       Sorry, I don't quite understand what this function is doing, it looks like it's making a Series with 1 array?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org