You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "rohanjain101 (via GitHub)" <gi...@apache.org> on 2023/11/21 17:35:13 UTC
[I] hash_mean overflows if numeric sum is larger than int64 max [arrow]
rohanjain101 opened a new issue, #38833:
URL: https://github.com/apache/arrow/issues/38833
### Describe the bug, including details regarding any error messages, version, and platform.
```
>>> df = pd.DataFrame({"A": pd.Series([True, True, True], dtype="bool[pyarrow]"), "B": pd.Series([9223372036854775805, 9223372036854775806, 9223372036854775807], dtype="int64[pyarrow]")})
>>> pa_table = pa.Table.from_pandas(df)
>>> pa.TableGroupBy(pa_table, ["A"]).aggregate([("B", "mean")])
pyarrow.Table
A: bool
B_mean: double
----
A: [[true]]
B_mean: [[3.0744573456182584e+18]]
>>>
```
I would expect B_mean to be 9.223372036854776e+18. Looks similar to https://github.com/apache/arrow/issues/34909
The scalar aggregate works as expected:
```
>>> compute.mean(pa_table["B"])
<pyarrow.DoubleScalar: 9.223372036854776e+18>
```
So I would expect the vector aggregate with a single group to produce the same result.
```
>>> pa.__version__
'14.0.0'
>>>
```
### Component(s)
C++, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] hash_mean overflows if numeric sum is larger than int64 max [arrow]
Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on issue #38833:
URL: https://github.com/apache/arrow/issues/38833#issuecomment-1822944427
Good catch. I'm adding this to my calendar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] hash_mean overflows if numeric sum is larger than int64 max [arrow]
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou closed issue #38833: hash_mean overflows if numeric sum is larger than int64 max
URL: https://github.com/apache/arrow/issues/38833
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org