You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/15 09:04:02 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #37897: [SPARK-40445][PS] Refactor `Resampler` to make it consistent with `GroupBy`

zhengruifeng opened a new pull request, #37897:
URL: https://github.com/apache/spark/pull/37897

   ### What changes were proposed in this pull request?
   Refactor `Resampler` to make it consistent with `GroupBy`
   
   
   ### Why are the changes needed?
   to simplify `Resampler`
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   existing UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37897:
URL: https://github.com/apache/spark/pull/37897#issuecomment-1249934673

   Merged into master, thank you @itholic @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #37897:
URL: https://github.com/apache/spark/pull/37897#discussion_r972518208


##########
python/pyspark/pandas/resample.py:
##########
@@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any:
             else:
                 return partial(property_or_func, self)
 
-    def min(self) -> Series:
-        return first_series(self._downsample("min"))
-
-    def max(self) -> Series:
-        return first_series(self._downsample("max"))
-
-    def sum(self) -> Series:
-        return first_series(self._downsample("sum").fillna(0.0))
-
-    def mean(self) -> Series:
-        return first_series(self._downsample("mean"))
-
-    def std(self) -> Series:
-        return first_series(self._downsample("std"))
-
-    def var(self) -> Series:
-        return first_series(self._downsample("var"))
+    def _cleanup_and_return(self, psdf: DataFrame) -> Series:
+        return first_series(psdf).rename().rename(self._psser.name)

Review Comment:
   qq: Why we do rename twice here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #37897:
URL: https://github.com/apache/spark/pull/37897#discussion_r972601113


##########
python/pyspark/pandas/groupby.py:
##########
@@ -3762,7 +3762,7 @@ def _apply_series_op(
             return psser.copy()
 
     def _cleanup_and_return(self, psdf: DataFrame) -> Series:

Review Comment:
   nice, let me update it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #37897:
URL: https://github.com/apache/spark/pull/37897#discussion_r972524711


##########
python/pyspark/pandas/resample.py:
##########
@@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any:
             else:
                 return partial(property_or_func, self)
 
-    def min(self) -> Series:
-        return first_series(self._downsample("min"))
-
-    def max(self) -> Series:
-        return first_series(self._downsample("max"))
-
-    def sum(self) -> Series:
-        return first_series(self._downsample("sum").fillna(0.0))
-
-    def mean(self) -> Series:
-        return first_series(self._downsample("mean"))
-
-    def std(self) -> Series:
-        return first_series(self._downsample("std"))
-
-    def var(self) -> Series:
-        return first_series(self._downsample("var"))
+    def _cleanup_and_return(self, psdf: DataFrame) -> Series:
+        return first_series(psdf).rename().rename(self._psser.name)

Review Comment:
   good question: I just copy it from `groupby`, let me check whether the first `rename` can be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity
URL: https://github.com/apache/spark/pull/37897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #37897:
URL: https://github.com/apache/spark/pull/37897#discussion_r972576467


##########
python/pyspark/pandas/groupby.py:
##########
@@ -3762,7 +3762,7 @@ def _apply_series_op(
             return psser.copy()
 
     def _cleanup_and_return(self, psdf: DataFrame) -> Series:

Review Comment:
   Should probably rename this to something else ... e.g., `_handle_output`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #37897:
URL: https://github.com/apache/spark/pull/37897#discussion_r972518208


##########
python/pyspark/pandas/resample.py:
##########
@@ -481,20 +489,5 @@ def __getattr__(self, item: str) -> Any:
             else:
                 return partial(property_or_func, self)
 
-    def min(self) -> Series:
-        return first_series(self._downsample("min"))
-
-    def max(self) -> Series:
-        return first_series(self._downsample("max"))
-
-    def sum(self) -> Series:
-        return first_series(self._downsample("sum").fillna(0.0))
-
-    def mean(self) -> Series:
-        return first_series(self._downsample("mean"))
-
-    def std(self) -> Series:
-        return first_series(self._downsample("std"))
-
-    def var(self) -> Series:
-        return first_series(self._downsample("var"))
+    def _cleanup_and_return(self, psdf: DataFrame) -> Series:
+        return first_series(psdf).rename().rename(self._psser.name)

Review Comment:
   qq: Why do we rename twice here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37897: [SPARK-40445][PS] Refactor `Resampler` for consistency and simplicity

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37897:
URL: https://github.com/apache/spark/pull/37897#issuecomment-1248782550

   cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org