You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alenka Frim (Jira)" <ji...@apache.org> on 2022/10/26 17:55:00 UTC
[jira] [Commented] (ARROW-17893) [Python] Bug: Wrong reading of timedelta
[ https://issues.apache.org/jira/browse/ARROW-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624646#comment-17624646 ]
Alenka Frim commented on ARROW-17893:
-------------------------------------
Thank you for reporting!
I can reproduce this behaviour with pyarrow 8.0.0.
{code:python}
>>> import pyarrow as pa
>>> pa.__version__
'8.0.0'
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 153 days 01:03:20
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 153 days 01:03:20
# Reading timedelta_1: 153 days 01:03:20
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 153 days 01:03:20
{code}
but it was apparently corrected in 9.0.0:
{code:python}
>>> import pyarrow as pa
>>> pa.__version__
'9.0.0'
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
# Reading timedelta_1: 0 days 03:40:23
# Reading timedelta_2: 0 days 03:40:23
{code}
I will take some time tomorrow to try and find the PR that made the change. In any case I will add a test to the codebase to have it covered.
> [Python] Bug: Wrong reading of timedelta
> ----------------------------------------
>
> Key: ARROW-17893
> URL: https://issues.apache.org/jira/browse/ARROW-17893
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 8.0.0
> Environment: macOS 12.6 on an Apple M1 Ultra
> Reporter: Yaser Alraddadi
> Priority: Critical
> Attachments: check_timedelta.py
>
>
> When there is a timedelta and a list of dictionary that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.
> below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}
> Here is the code, also it is attached as check_timedelta.py
>
> {code:java}
> from datetime import datetime, timedelta
> import pandas as pd
> import pyarrow.feather as feather
> time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
> time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
> data = [
> {
> "waiting_time": timedelta(seconds=12, microseconds=1),
> },
> {
> "waiting_time": timedelta(seconds=1020),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=815, microseconds=1),
> },
> ]
> df = pd.DataFrame(
> [
> {
> "time_1": time_1,
> "time_2": time_2,
> "data": data,
> "timedelta_1": time_2 - time_1,
> "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
> },
> ]
> )
> print("Correct timedelta_1: ", df["timedelta_1"].item())
> print("Correct timedelta_2: ", df["timedelta_2"].item())
> with open(f"records.feather.lz4", "wb") as f:
> feather.write_feather(df, f, compression="lz4")
> for _ in range(10):
> with open(f"records.feather.lz4", "rb") as f:
> print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
> print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
> {code}
>
>
> Printed Results
>
> {code:java}
> Correct timedelta_1: 0 days 03:40:23
> Correct timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)