You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yaser Alraddadi (Jira)" <ji...@apache.org> on 2022/09/29 13:07:00 UTC
[jira] [Updated] (ARROW-17893) Wrong reading of timedelta

     [ https://issues.apache.org/jira/browse/ARROW-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yaser Alraddadi updated ARROW-17893:
------------------------------------
    Description: 
When there is a timedelta and a list of dictionary that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.

below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}

Here is the code, also it is attached as check_timedelta.py

 
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
    {
        "waiting_time": timedelta(seconds=12, microseconds=1),
    },
    {
        "waiting_time": timedelta(seconds=1020),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=815, microseconds=1),
    },
]
df = pd.DataFrame(
    [
        {
            "time_1": time_1,
            "time_2": time_2,
            "data": data,
            "timedelta_1": time_2 - time_1,
            "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
        },
    ]
)

print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())

with open(f"records.feather.lz4", "wb") as f:
    feather.write_feather(df, f, compression="lz4")

for _ in range(10):
    with open(f"records.feather.lz4", "rb") as f:
        print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
        print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
{code}
 

 

Printed Results

 
{code:java}
Correct timedelta_1:  0 days 03:40:23
Correct timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20{code}
 

 

  was:
When there is a timedelta and a list of dictionary and that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.

below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}

Here is the code, also it is attached as check_timedelta.py

 
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
    {
        "waiting_time": timedelta(seconds=12, microseconds=1),
    },
    {
        "waiting_time": timedelta(seconds=1020),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=815, microseconds=1),
    },
]
df = pd.DataFrame(
    [
        {
            "time_1": time_1,
            "time_2": time_2,
            "data": data,
            "timedelta_1": time_2 - time_1,
            "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
        },
    ]
)

print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())

with open(f"records.feather.lz4", "wb") as f:
    feather.write_feather(df, f, compression="lz4")

for _ in range(10):
    with open(f"records.feather.lz4", "rb") as f:
        print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
        print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
{code}
 

 

Printed Results

 
{code:java}
Correct timedelta_1:  0 days 03:40:23
Correct timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20{code}
 

 


> Wrong reading of timedelta
> --------------------------
>
>                 Key: ARROW-17893
>                 URL: https://issues.apache.org/jira/browse/ARROW-17893
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 8.0.0
>            Reporter: Yaser Alraddadi
>            Priority: Critical
>         Attachments: check_timedelta.py
>
>
> When there is a timedelta and a list of dictionary that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.
> below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}
> Here is the code, also it is attached as check_timedelta.py
>  
> {code:java}
> from datetime import datetime, timedelta
> import pandas as pd
> import pyarrow.feather as feather
> time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
> time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
> data = [
>     {
>         "waiting_time": timedelta(seconds=12, microseconds=1),
>     },
>     {
>         "waiting_time": timedelta(seconds=1020),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=815, microseconds=1),
>     },
> ]
> df = pd.DataFrame(
>     [
>         {
>             "time_1": time_1,
>             "time_2": time_2,
>             "data": data,
>             "timedelta_1": time_2 - time_1,
>             "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
>         },
>     ]
> )
> print("Correct timedelta_1: ", df["timedelta_1"].item())
> print("Correct timedelta_2: ", df["timedelta_2"].item())
> with open(f"records.feather.lz4", "wb") as f:
>     feather.write_feather(df, f, compression="lz4")
> for _ in range(10):
>     with open(f"records.feather.lz4", "rb") as f:
>         print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
>         print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
> {code}
>  
>  
> Printed Results
>  
> {code:java}
> Correct timedelta_1:  0 days 03:40:23
> Correct timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)