You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/26 10:33:00 UTC
[jira] [Updated] (ARROW-17893) [Python] Bug: Wrong reading of timedelta
[ https://issues.apache.org/jira/browse/ARROW-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-17893:
------------------------------------------
Priority: Critical (was: Blocker)
> [Python] Bug: Wrong reading of timedelta
> ----------------------------------------
>
> Key: ARROW-17893
> URL: https://issues.apache.org/jira/browse/ARROW-17893
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 8.0.0
> Environment: macOS 12.6 on an Apple M1 Ultra
> Reporter: Yaser Alraddadi
> Priority: Critical
> Attachments: check_timedelta.py
>
>
> When there is a timedelta and a list of dictionary that also has timedelta as well, reading the upper timedelta in feather format sometimes gives wrong reading.
> below is an example if you check the printed results sometime it reads the upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes as {color:#de350b}153 days 01:03:20 wrong{color}
> Here is the code, also it is attached as check_timedelta.py
>
> {code:java}
> from datetime import datetime, timedelta
> import pandas as pd
> import pyarrow.feather as feather
> time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
> time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
> data = [
> {
> "waiting_time": timedelta(seconds=12, microseconds=1),
> },
> {
> "waiting_time": timedelta(seconds=1020),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=960),
> },
> {
> "waiting_time": timedelta(seconds=815, microseconds=1),
> },
> ]
> df = pd.DataFrame(
> [
> {
> "time_1": time_1,
> "time_2": time_2,
> "data": data,
> "timedelta_1": time_2 - time_1,
> "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
> },
> ]
> )
> print("Correct timedelta_1: ", df["timedelta_1"].item())
> print("Correct timedelta_2: ", df["timedelta_2"].item())
> with open(f"records.feather.lz4", "wb") as f:
> feather.write_feather(df, f, compression="lz4")
> for _ in range(10):
> with open(f"records.feather.lz4", "rb") as f:
> print("Reading timedelta_1: ", feather.read_feather(f)["timedelta_1"].item())
> print("Reading timedelta_2: ", feather.read_feather(f)["timedelta_2"].item())
> {code}
>
>
> Printed Results
>
> {code:java}
> Correct timedelta_1: 0 days 03:40:23
> Correct timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 0 days 03:40:23
> Reading timedelta_1: 0 days 03:40:23
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20
> Reading timedelta_1: 153 days 01:03:20
> Reading timedelta_2: 153 days 01:03:20{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)