You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alenka Frim (Jira)" <ji...@apache.org> on 2022/10/26 17:55:00 UTC

[jira] [Assigned] (ARROW-17893) [Python] Bug: Wrong reading of timedelta

     [ https://issues.apache.org/jira/browse/ARROW-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alenka Frim reassigned ARROW-17893:
-----------------------------------

    Assignee: Alenka Frim

> [Python] Bug: Wrong reading of timedelta
> ----------------------------------------
>
>                 Key: ARROW-17893
>                 URL: https://issues.apache.org/jira/browse/ARROW-17893
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 8.0.0
>         Environment: macOS 12.6 on an Apple M1 Ultra
>            Reporter: Yaser Alraddadi
>            Assignee: Alenka Frim
>            Priority: Critical
>         Attachments: check_timedelta.py
>
>
> When there is a timedelta and a list of dictionary that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.
> below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}
> Here is the code, also it is attached as check_timedelta.py
>  
> {code:java}
> from datetime import datetime, timedelta
> import pandas as pd
> import pyarrow.feather as feather
> time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
> time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
> data = [
>     {
>         "waiting_time": timedelta(seconds=12, microseconds=1),
>     },
>     {
>         "waiting_time": timedelta(seconds=1020),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=815, microseconds=1),
>     },
> ]
> df = pd.DataFrame(
>     [
>         {
>             "time_1": time_1,
>             "time_2": time_2,
>             "data": data,
>             "timedelta_1": time_2 - time_1,
>             "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
>         },
>     ]
> )
> print("Correct timedelta_1: ", df["timedelta_1"].item())
> print("Correct timedelta_2: ", df["timedelta_2"].item())
> with open(f"records.feather.lz4", "wb") as f:
>     feather.write_feather(df, f, compression="lz4")
> for _ in range(10):
>     with open(f"records.feather.lz4", "rb") as f:
>         print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
>         print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
> {code}
>  
>  
> Printed Results
>  
> {code:java}
> Correct timedelta_1:  0 days 03:40:23
> Correct timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)