You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/09/08 01:38:01 UTC

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1319259812


##########
python/pyspark/pandas/indexes/datetimes.py:
##########
@@ -214,28 +215,8 @@ def microsecond(self) -> Index:
         )
         return Index(self.to_series().dt.microsecond)
 
-    @property
-    def week(self) -> Index:
-        """
-        The week ordinal of the year.
-
-        .. deprecated:: 3.5.0
-        """
-        warnings.warn(
-            "`week` is deprecated in 3.5.0 and will be removed in 4.0.0.",
-            FutureWarning,
-        )
-        return Index(self.to_series().dt.week)
-
-    @property
-    def weekofyear(self) -> Index:
-        warnings.warn(
-            "`weekofyear` is deprecated in 3.5.0 and will be removed in 4.0.0.",
-            FutureWarning,
-        )
-        return Index(self.to_series().dt.weekofyear)
-
-    weekofyear.__doc__ = week.__doc__
+    def isocalendar(self) -> DataFrame:

Review Comment:
   Do we need docstring?



##########
python/pyspark/pandas/datetimes.py:
##########
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # type: ignore[no-untyped-def
     def nanosecond(self) -> "ps.Series":
         raise NotImplementedError()
 
-    # TODO(SPARK-42617): Support isocalendar.week and replace it.
-    # See also https://github.com/pandas-dev/pandas/pull/33595.
-    @property
-    def week(self) -> "ps.Series":
+    def isocalendar(self) -> "ps.DataFrame":
         """
-        The week ordinal of the year.
+        Calculate year, week, and day according to the ISO 8601 standard.
 
-        .. deprecated:: 3.4.0
-        """
-        warnings.warn(
-            "weekofyear and week have been deprecated.",
-            FutureWarning,
-        )
-        return self._data.spark.transform(lambda c: F.weekofyear(c).cast(LongType()))
+            .. versionadded:: 4.0.0
 
-    @property
-    def weekofyear(self) -> "ps.Series":
-        return self.week
+        Returns
+        -------
+        DataFrame
+            With columns year, week and day.
 
-    weekofyear.__doc__ = week.__doc__
+        .. note:: Returns have int64 type instead of UInt32 as is in pandas due to UInt32
+            is not supported by spark
+
+        Examples
+        --------

Review Comment:
   Can we consider & have an example including `pd.NaT`? Seems like this case is not working currently:
   
   **Pandas**
   ```python
   >>> ser = pd.to_datetime(pd.Series(["2010-01-01", pd.NaT]))
   >>> ser.dt.isocalendar()
      year  week  day
   0  2009    53     5
   1  <NA>  <NA>  <NA>
   ```
   
   **Current implementation**
   ```python
   >>> ser = pd.to_datetime(pd.Series(["2010-01-01", pd.NaT]))
   >>> psser = ps.from_pandas(ser)
   # ValueError: cannot convert NA to integer
   ```
   
   In Spark, we can't use mixed type within single column, so we should convert `NA` to proper type (e.g. use NaN instead of `NA` for float type in this case) as below:
   
   ```python
   >>> psser.dt.week
   0    53.0
   1     NaN
   dtype: float64
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org