You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2020/09/06 00:24:00 UTC

[jira] [Commented] (ARROW-9915) [Java] getObject API for temporal types is inconsistent and in some cases incorrect

    [ https://issues.apache.org/jira/browse/ARROW-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191169#comment-17191169 ] 

Micah Kornfield commented on ARROW-9915:
----------------------------------------

[~mjadczak_gsa] thank you for the detailed analysis.  The last time this was [discussed on the mailing list|[https://lists.apache.org/thread.html/bd9286591fa8bd682322c625f4176701af78e2510005a0488da359b3%40%3Cdev.arrow.apache.org%3E]], it was decided not to make these changes (but possibly add utility methods).  we can potentially discuss again.

> [Java] getObject API for temporal types is inconsistent and in some cases incorrect
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-9915
>                 URL: https://issues.apache.org/jira/browse/ARROW-9915
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.13.0, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.17.0, 0.17.1, 1.0.0
>            Reporter: Matt Jadczak
>            Priority: Major
>
> It seems that the work which has been tracked in ARROW-2015 and merged in [https://github.com/apache/arrow/pull/2966] to change the return types of the various Time and Date vector types when using the getObject API missed some of the vector types which are temporal and so should return a temporal type, and provided an incorrect implementation for others (some of this was pointed out in the initial PR review, but it seems that it slipped through the cracks and was not addressed before merging).
> Here is a table of the various temporal vector types, what they currently return from getObject, and what they should return, in my opinion (I have included ones in which the implementation is correct for completeness, and coloured them green).
>  
>  
> ||Vector class||Current return type||Proposed return type||Comments||
> |DateDayVector|Integer|LocalDate|Currently returns the raw value of days since epoch, should return the actual date|
> |DateMilliVector|LocalDateTime|LocalDate|This type is supposed to encode a date, not a datetime, so even though epoch millis are used, the result should be a LocalDate|
> |{color:#00875a}DurationVector{color}|{color:#00875a}Duration{color}|{color:#00875a}Duration{color}|{color:#00875a}Correct.{color}|
> |IntervalDayVector|Duration|Period|As per [https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251] , Interval should be a calendar-based datatype, not a time-based one. This is represented in Java by a Period type. However, I note that the Java Period class does not support milliseconds, unlike the Arrow type, which might be why Duration is being returned. Some discussion may be needed on the best way to deal with this.|
> |{color:#00875a}IntervalYearVector{color}|{color:#00875a}Period{color}|{color:#00875a}Period{color}|{color:#00875a}Correct.{color}|
> |TimeMicroVector|Long|LocalTime|Currently returns the raw number of micros, should return the actual time|
> |TimeMilliVector|LocalDateTime|LocalTime|Currently returns a datetime on 1970-01-01 with the correct time component, should just return the time|
> |TimeNanoVector|Long|LocalTime|Currently returns the raw number of nanos, should return the actual time|
> |TimeSecVector|Integer|LocalTime|Currently returns the raw number of seconds, should return the actual time|
> |{color:#00875a}TimeStampMicroVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampMilliVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampNanoVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampSecVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |TimeStampMicroTZVector|Long|ZonedDateTime|Currently returns the underlying micros, and TZ has to be obtained separately. Should return the actual datetime with timezone|
> |TimeStampMilliTZVector|Long|ZonedDateTime|Currently returns the underlying millis, and TZ has to be obtained separately. Should return the actual datetime with timezone|
> |TimeStampNanoTZVector|Long|ZonedDateTime|Currently returns the underlying nanos, and TZ has to be obtained separately. Should return the actual datetime with timezone|
> |TimeStampSecTZVector|Long|ZonedDateTime|Currently returns the underlying seconds, and TZ has to be obtained separately. Should return the actual datetime with timezone|
> I am happy to submit a PR to fix this if there is no other reason for this to persist other than these types being rarely used, but note that these would all be breaking API changes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)