You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Eric Hanson (JIRA)" <ji...@apache.org> on 2013/12/04 19:34:35 UTC

[jira] [Commented] (HIVE-5762) Implement vectorized support for the DECIMAL data type

    [ https://issues.apache.org/jira/browse/HIVE-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839178#comment-13839178 ] 

Eric Hanson commented on HIVE-5762:
-----------------------------------

The new fixed precision/scale decimal type DECIMAL(p, s) has maximum precision and scale of 38. 38 decimal digits, signed, will fit in a signed, 128-bit int (2 longs). 2^127-1 is 1.70141E+38. In a column, every number must have the same precision and scale, so that can be abstracted into the column vector or the VectorExpression operator itself and out if the individual data elements. 

So I'm thinking that a new DecimalColumnVector type could be created that contains 2 arrays of long. 

class DecimalColumnVector extends ColumnVector {
  long[] vectorLow;  // low order 64 bits of 128 bit int
  long[] vectorHigh; // high order 64 bits of 128 bit int
  int precision;
  int scale;
}

Then arithmetic and comparisons can be implemented that can be fast by relying on standard arithmetic and comparisons on long as a building block. How exactly to do the arithmetic and comparisons operations needs more thought.


> Implement vectorized support for the DECIMAL data type
> ------------------------------------------------------
>
>                 Key: HIVE-5762
>                 URL: https://issues.apache.org/jira/browse/HIVE-5762
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>
> Add support to allow queries referencing DECIMAL columns and expression results to run efficiently in vectorized mode.  Include unit tests and end-to-end tests. 
> Before starting or at least going very far, please write design specification (a new section for the design spec attached to HIVE-4160) for how support for the different DECIMAL types should work in vectorized mode, and the roadmap, and have it reviewed. 
> It may be feasible to re-use LongColumnVector and related VectorExpression classes for fixed-point decimal in certain data ranges. That should be at least considered to get faster performance and save code. For unlimited precision DECIMAL, a new column vector subtype may be needed, or a BytesColumnVector could be re-used.



--
This message was sent by Atlassian JIRA
(v6.1#6144)