You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2024/01/03 06:10:41 UTC

[PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

itholic opened a new pull request, #44570:
URL: https://github.com/apache/spark/pull/44570

   ### What changes were proposed in this pull request?
   
   This PR proposes to fix `Series.astype` to work properly with missing value.
   
   
   ### Why are the changes needed?
   
   To follow the behavior of latest Pandas.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the bug is fixed to follow the behavior of Pandas:
   
   **Before**
   ```python
   >>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)])
   >>> psser.astype(bool)
   0    True
   1    True
   2    False
   dtype: bool
   ```
   
   **After**
   ```python
   >>> psser = ps.Series([decimal.Decimal(1), decimal.Decimal(2), decimal.Decimal(np.nan)])
   >>> psser.astype(bool)
   0    True
   1    True
   2    True
   dtype: bool
   ```
   
   ### How was this patch tested?
   
   Enable the existing UTs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44570: [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value
URL: https://github.com/apache/spark/pull/44570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44570:
URL: https://github.com/apache/spark/pull/44570#issuecomment-1884024061

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44570:
URL: https://github.com/apache/spark/pull/44570#issuecomment-1876143838

   @itholic can you retrigger and/or fix the tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44570:
URL: https://github.com/apache/spark/pull/44570#issuecomment-1882608032

   CI passed. @dongjoon-hyun @HyukjinKwon FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44570:
URL: https://github.com/apache/spark/pull/44570#issuecomment-1875041034

   Oh, it seems the the failures are relevant, @itholic .
   ```
   pyspark.errors.exceptions.base.PySparkAssertionError: [DIFFERENT_PANDAS_SERIES] Series are not almost equal:
   Left:
   bool
   Right:
   bool
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44570:
URL: https://github.com/apache/spark/pull/44570#issuecomment-1876460729

   Oh.. seems like we should separately handle the boolean data type. I got some personal errands right now, so let me take a look tomorrow. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-37039][PS] Fix `Series.astype` to work properly with missing value [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44570:
URL: https://github.com/apache/spark/pull/44570#discussion_r1440155270


##########
python/pyspark/pandas/tests/data_type_ops/test_as_type.py:
##########
@@ -54,10 +54,7 @@ def test_astype(self):
                         lambda: psser.astype(int_type),
                     )
 
-            # TODO(SPARK-37039): the np.nan series.astype(bool) should be True

Review Comment:
   Should probably add this into migration guide though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org