You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/02 01:51:34 UTC

[GitHub] [spark] bzhaoopenstack opened a new pull request, #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

bzhaoopenstack opened a new pull request, #37366:
URL: https://github.com/apache/spark/pull/37366

   PySpark raises Error when we call shift func with periods=0.
   
   The behavior of Pandas will return a same copy for the said obj.
   
   ### What changes were proposed in this pull request?
   Will return self.copy when period == 0
   
   
   ### Why are the changes needed?
   Behaviors between PySpark and pandas are different
   
   PySpark:
   ```
   >>> df = ps.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, 33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
   >>> df.Col1.shift(periods=3)
   0     NaN
   1     NaN
   2     NaN
   3    10.0
   4    20.0
   Name: Col1, dtype: float64
   >>> df.Col1.shift(periods=0)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/spark/spark/python/pyspark/pandas/base.py", line 1170, in shift
       return self._shift(periods, fill_value).spark.analyzed
     File "/home/spark/spark/python/pyspark/pandas/spark/accessors.py", line 256, in analyzed
       return first_series(DataFrame(self._data._internal.resolved_copy))
     File "/home/spark/spark/python/pyspark/pandas/utils.py", line 589, in wrapped_lazy_property
       setattr(self, attr_name, fn(self))
     File "/home/spark/spark/python/pyspark/pandas/internal.py", line 1173, in resolved_copy
       sdf = self.spark_frame.select(self.spark_columns + list(HIDDEN_COLUMNS))
     File "/home/spark/spark/python/pyspark/sql/dataframe.py", line 2073, in select
       jdf = self._jdf.select(self._jcols(*cols))
     File "/home/spark/.pyenv/versions/3.8.13/lib/python3.8/site-packages/py4j/java_gateway.py", line 1321, in __call__
       return_value = get_return_value(
     File "/home/spark/spark/python/pyspark/sql/utils.py", line 196, in deco
       raise converted from None
   pyspark.sql.utils.AnalysisException: Cannot specify window frame for lag function
   ```
   
   pandas:
   ```
   >>> pdf = pd.DataFrame({'Col1': [10, 20, 15, 30, 45], 'Col2': [13, 23, 18, 33, 48],'Col3': [17, 27, 22, 37, 52]},columns=['Col1', 'Col2', 'Col3'])
   >>> pdf.Col1.shift(periods=3)
   0     NaN
   1     NaN
   2     NaN
   3    10.0
   4    20.0
   Name: Col1, dtype: float64
   >>> pdf.Col1.shift(periods=0)
   0    10
   1    20
   2    15
   3    30
   4    45
   Name: Col1, dtype: int64
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   call shift func with period == 0.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37366:
URL: https://github.com/apache/spark/pull/37366#issuecomment-1203779827

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37366:
URL: https://github.com/apache/spark/pull/37366#issuecomment-1203205074

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0
URL: https://github.com/apache/spark/pull/37366


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37366: [SPARK-39939][PYTHON][PS] return self.copy during calling shift with period == 0

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37366:
URL: https://github.com/apache/spark/pull/37366#issuecomment-1201972652

   cc @itholic @xinrong-meng @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org