You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yaser Alraddadi (Jira)" <ji...@apache.org> on 2022/09/29 13:05:00 UTC

[jira] [Created] (ARROW-17893) Wrong reading of timedelta

Yaser Alraddadi created ARROW-17893:
---------------------------------------

             Summary: Wrong reading of timedelta
                 Key: ARROW-17893
                 URL: https://issues.apache.org/jira/browse/ARROW-17893
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 8.0.0
            Reporter: Yaser Alraddadi
         Attachments: check_timedelta.py

When there is a timedelta and a list of dictionary and that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.

below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}

Here is the code, also it is attached as check_timedelta.py

 
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
    {
        "waiting_time": timedelta(seconds=12, microseconds=1),
    },
    {
        "waiting_time": timedelta(seconds=1020),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=815, microseconds=1),
    },
]
df = pd.DataFrame(
    [
        {
            "time_1": time_1,
            "time_2": time_2,
            "data": data,
            "timedelta_1": time_2 - time_1,
            "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
        },
    ]
)

print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())

with open(f"records.feather.lz4", "wb") as f:
    feather.write_feather(df, f, compression="lz4")

for _ in range(10):
    with open(f"records.feather.lz4", "rb") as f:
        print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
        print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
{code}
 

 

Printed Results

 
{code:java}
Correct timedelta_1:  0 days 03:40:23
Correct timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)