You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/09/12 03:35:54 UTC

[GitHub] [spark] itholic commented on pull request #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

itholic commented on PR #42788:
URL: https://github.com/apache/spark/pull/42788#issuecomment-1714912254

   So far, we don't follow the Pandas behavior since we couldn't support the object-dtype for stat functions in some cases as beolw:
   ```python
   # DataFrame
   >>> pdf
      A  B
   0  1  a
   1  2  b
   2  3  c
   
   # Pandas works
   >>> pdf.min(numeric_only=False)
   A    1
   B    a
   dtype: object
   
   # Pandas API on Spark doesn't work
   >>> ps.from_pandas(pdf).min(numeric_only=False)
   ...
   pyarrow.lib.ArrowInvalid: Could not convert 'a' with type str: tried to convert to int64
   ```
   
   But on my second thought, it's a bug from our code in Pandas API on Spark so we can support `numeric_only=False` as default by fixing the existing bug.
   
   Let me just close this ticket, and change the default value instead.
   
   Thanks for pointing out, @zhengruifeng !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org