You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kevin Crouse (Jira)" <ji...@apache.org> on 2022/03/25 05:45:00 UTC
[jira] [Comment Edited] (ARROW-16022) floor_temporal / ceil_temporal throws exception for existing timestamps if ambiguous/existing
[ https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512204#comment-17512204 ]
Kevin Crouse edited comment on ARROW-16022 at 3/25/22, 5:44 AM:
----------------------------------------------------------------
[~rokm] ,
I understand the overhead issue and am fine with it not throwing an Exception on creation. My point is more that _if it must throw an exception,_ it should be on creation and not when calling a function to round to the nearest second/millisecond/nanosecond. The variable already exists with a value, and whether technically valid or not, floor/ceil/round to the next second doesn't change its valid-ness.
I think the PR you referenced would make it so that one has an option to use the *_temporal methods without it throwing an exception, which is not possible right now. That's better, but it does change the underlying data for something that conceptually should not - consider the ambiguous time of 1:30am on US Eastern Time's daylight savings day. I can create a pyarrow array and scalar with this timestamp. At the time I call floor_temporal, having only an option (via the referenced PR) for that to jump to 3:00:00am or drop to 12:59am isn't very good and will mess up comparisons to timestamps that I don't have to call *_timestamp on. Whatever the behavior, I would propose an invariant that a call to *_temporal with a unit of 'seconds' should render values within 1 second of the original value.
was (Author: JIRAUSER286896):
[~rokm] ,
I understand the overhead issue and am fine with it not throwing an Exception on creation. My point is more that _if it must throw an exception,_ it should be on creation and not when calling a function to round to the nearest second/millisecond/nanosecond. The variable already exists with a value, and whether technically valid or not, floor/ceil/round to the next second doesn't change it valid-ness.
I think the PR you referenced would make it so that one has an option to use the *_temporal methods without it throwing an exception, which is not possible right now. That's better, but it does change the underlying data for something that conceptually should not - consider the ambiguous time of 1:30am on US Eastern Time's daylight savings day. I can create a pyarrow array and scalar with this timestamp. At the time I call floor_temporal, having only an option (via the referenced PR) for that to jump to 3:00:00am or drop to 12:59am isn't very good and will mess up comparisons to timestamps that I don't have to call *_timestamp on. Whatever the behavior, I would propose an invariant that a call to *_temporal with a unit of 'seconds' should render values within 1 second of the original value.
> floor_temporal / ceil_temporal throws exception for existing timestamps if ambiguous/existing
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-16022
> URL: https://issues.apache.org/jira/browse/ARROW-16022
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Reporter: Kevin Crouse
> Priority: Major
>
> Running pyarrow.compute.floor_temporal for timestamps that exist will throw exceptions if the times are ambiguous during the daylight savings time transitions.
> As the *_temporal functions do not fundamentally change the times, it does not make sense that they would fail due to a timezone issue. If they must fail, it should be when the pyarrow.Timestamp is created.
>
>
> {code:java}
> import pyarrow
> import pyarrow.compute as pc
> import datetime
> import pytz
> t = pyarrow.timestamp('s', tz='America/New_York')
> dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = pytz.timezone('America/New_York'))
> # if a timestamp must be invalid, this could fail
> za = pyarrow.array([dt], t)
> # raises an exception, even though this is conceptually an identity function here
> pc.floor_temporal(za, unit = 'second') {code}
>
> And this actually works just fine (continued from above)
> {code:java}
> pc.cast(
> pc.floor_temporal(
> pc.cast(za, pyarrow.timestamp('s', 'UTC')),
> unit='second'),
> pyarrow.timestamp('s','America/New_York')
> )
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)