You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2015/07/29 06:28:04 UTC

[jira] [Commented] (SPARK-9404) UnsafeArrayData

    [ https://issues.apache.org/jira/browse/SPARK-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645435#comment-14645435 ] 

Wenchen Fan commented on SPARK-9404:
------------------------------------

I have a new idea about the encoding. Since array element is always of same type, and we know the type before read element, so for primitive types, we can inline them, and use a bit set for null checking:

first 4 bytes is the # of elements
then n bytes of null checking bit set, alighed to word.
then each 1 byte for boolean, byte type,
or each 2 bytes for shor type
or each 4 bytes for int, float type
or each 8 bytes for long, double type
or each 12 bytes for interval type
or each 8 bytes(offset combine length) for variable-length values 
followed by variable length portion


> UnsafeArrayData
> ---------------
>
>                 Key: SPARK-9404
>                 URL: https://issues.apache.org/jira/browse/SPARK-9404
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Wenchen Fan
>
> An Unsafe-based ArrayData implementation. To begin with, we can encode data this way:
> first 4 bytes is the # elements
> then each 4 byte is the start offset of the element, unless it is negative, in which case the element is null.
> followed by the elements themselves
> For example, [10, 11, 12, 13, null, 14], internally should be represented as (each 4 bytes)
> 5, 28, 32, 36, 40, -44, 48, 10, 11, 12, 13, 0, 14



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org