You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/08/29 01:26:00 UTC

[jira] [Commented] (ARROW-412) [Format] Handling of buffer padding in the IPC metadata

    [ https://issues.apache.org/jira/browse/ARROW-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144630#comment-16144630 ] 

Wes McKinney commented on ARROW-412:
------------------------------------

This is a slightly thorny subject. 

In C++, generally the buffers already have padding in the size, but if we call {{buffer->Resize(size)}} with a size that does not have padding, then the buffer capacity will have padding, even though the size is less. 

When we write buffers in IPC messages, padding must be added so that at minimum the end of a buffer starts and ends on an 8 byte boundary, and maybe preferable 64 byte ending boundary. This padding is being included in the buffer metadata in the Flatbuffers message: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L171

The problem with this is that buffer sizes may change in IPC round trips. So if we're hashing buffer contents then hash values may change. If the IPC receiver is concerned about this, then the buffer size could be adjusted based on the other parts of the metadata. 

One solution is to send the accurate buffer sizes. The receiver could decide how to set the size and capacity of the computed buffers on receive. This wouldn't change the actual memory layout on the wire (because padding for alignment purposes is required), just the buffer size metadata.

> [Format] Handling of buffer padding in the IPC metadata
> -------------------------------------------------------
>
>                 Key: ARROW-412
>                 URL: https://issues.apache.org/jira/browse/ARROW-412
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>             Fix For: 0.7.0
>
>
> See discussion in ARROW-399. Do we include padding bytes in the metadata or set the actual used bytes? In the latter case, the padding would be a part of the format (any buffers continue to be expected to be 64-byte padded, to permit AVX512 instructions)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)