You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2013/12/01 21:18:35 UTC

[jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type

    [ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836108#comment-13836108 ] 

Teddy Choi commented on HIVE-5761:
----------------------------------

I wrote a draft version.

{quote}
DATE shall be implemented within a LongColumnVector. HIVE-4055 represents a DATE value by a number of days since epoch. A vectorized DATE representation will contain this number and its optional cached parse result. A read operation result and a complex date function result, such as date_add and date_sub, will have an empty cache. During the first simple date function, such as year, month and day, it will cache its parse result. Then following simple functions will reuse its cache to avoid repeated parses. Its effect on performance will be small, since java.util.Date calculates all fields at once and caches their results. The first 32-bit set will represent a number of days since epoch as a signed integer. Its range is about from BC 2^31/365-1970 to AD 2^31/365+1970. A comparison between vectorized DATE values should consider only their first sets. The following 32-bit set will represent its cached parse result; cached state (1 bit; 0 for not cached, 1 for cached), era (1 bit; 0 for AD, 1 for BC), year (unsigned 21-bit integer), month (unsigned 4-bit integer) and day of month (unsigned 5-bit integer). A value without a cache will have only zero bits after its first set. A parsed year, month and day of month value will start from 1 to represent the exact number. Its range is from BC 2^21 to AD 2^21, which is shorter than the first set. If a date is not in the range, its cached state will remain false (0). The value 0xFFFFFFFF00000000L shall be reserved for future use to indicate data outside the standard range.
{quote}

> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
>                 Key: HIVE-5761
>                 URL: https://issues.apache.org/jira/browse/HIVE-5761
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
>
> Add support to allow queries referencing DATE columns and expression results to run efficiently in vectorized mode. This should re-use the code for the the integer/timestamp types to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)