You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "rohanjain101 (via GitHub)" <gi...@apache.org> on 2023/11/21 17:35:13 UTC

[I] hash_mean overflows if numeric sum is larger than int64 max [arrow]

rohanjain101 opened a new issue, #38833:
URL: https://github.com/apache/arrow/issues/38833

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```
   >>> df = pd.DataFrame({"A": pd.Series([True, True, True], dtype="bool[pyarrow]"), "B": pd.Series([9223372036854775805, 9223372036854775806, 9223372036854775807], dtype="int64[pyarrow]")})
   >>> pa_table = pa.Table.from_pandas(df)
   >>> pa.TableGroupBy(pa_table, ["A"]).aggregate([("B", "mean")])
   pyarrow.Table
   A: bool
   B_mean: double
   ----
   A: [[true]]
   B_mean: [[3.0744573456182584e+18]]
   >>>
   ```
   
   I would expect B_mean to be 9.223372036854776e+18. Looks similar to https://github.com/apache/arrow/issues/34909
   
   The scalar aggregate works as expected:
   
   ```
   >>> compute.mean(pa_table["B"])
   <pyarrow.DoubleScalar: 9.223372036854776e+18>
   ```
   
   So I would expect the vector aggregate with a single group to produce the same result.
   
   ```
   >>> pa.__version__
   '14.0.0'
   >>>
   ```
   
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] hash_mean overflows if numeric sum is larger than int64 max [arrow]

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on issue #38833:
URL: https://github.com/apache/arrow/issues/38833#issuecomment-1822944427

   Good catch. I'm adding this to my calendar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] hash_mean overflows if numeric sum is larger than int64 max [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou closed issue #38833: hash_mean overflows if numeric sum is larger than int64 max
URL: https://github.com/apache/arrow/issues/38833


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org