You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2017/05/12 01:43:04 UTC

[jira] [Commented] (ARROW-1011) [Format] Clarify in Layout.md what are the expectations for buffer padding in IPC vs. in-memory data

    [ https://issues.apache.org/jira/browse/ARROW-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007504#comment-16007504 ] 

Wenchen Fan commented on ARROW-1011:
------------------------------------

I think the padding is already well defined in the memory layout specification:
1. "Unless otherwise noted, padded bytes do not need to have a specific value", which means in general, the value of padding bytes is unspecified.
2. "A 1 (set bit) for index j indicates that the value is not null, while a 0 (bit not set) indicates that it is null. Bitmaps are to be initialized to be all unset at allocation time (this includes padding)", which means the value of padding bytes for null bitmap is always 0.

Did I miss something?

> [Format] Clarify in Layout.md what are the expectations for buffer padding in IPC vs. in-memory data
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1011
>                 URL: https://issues.apache.org/jira/browse/ARROW-1011
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>
> This has come up in https://github.com/apache/arrow/pull/673 and in prior discussions. 
> The basic summary is that we should not write non-zero padding bytes in IPC messages. However: one cannot in general rely on the padding being non-zero when the data is in memory (for example: zero-copy slices of Arrow arrays/vectors).
> I think it would be good to clarify this point in Layout.md -- namely that what gets written to the wire should be deterministic. However, in-memory algorithms should not in general expect the padding region to have a particular value. As an example, a popcount on a validity bitmap would want to exclude padding bytes from the computation. Other elementwise SIMD operations are free to use the padding bytes as they wish, with a known caveat. 
> cc [~cloud_fan]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)