You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2016/12/05 18:55:58 UTC

[jira] [Commented] (ARROW-399) [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata

    [ https://issues.apache.org/jira/browse/ARROW-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723023#comment-15723023 ] 

Julien Le Dem commented on ARROW-399:
-------------------------------------

We can change the behavior on the java side and at a minimum not infer sizes that don't match the metadata.

Although as a separate discussion we can pad buffers without changing their size. The Metadata can still reflect the size of the buffer that we actually use and we leave unused space in between buffers or make sure the buffers start on the appropriately aligned address.

> [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-399
>                 URL: https://issues.apache.org/jira/browse/ARROW-399
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java - Vectors
>            Reporter: Wes McKinney
>            Assignee: Julien Le Dem
>            Priority: Blocker
>         Attachments: list_error.json
>
>
> Discovered this during integration testing. Because Arrow-C++ writes buffers padded to 64 bytes, they may appear larger to the Java library than they need to be. In ListVector.loadFieldBuffers, the ArrowFieldNode is never used:
> {code:language=java}
>   @Override
>   public void loadFieldBuffers(ArrowFieldNode fieldNode, List<ArrowBuf> ownBuffers) {
>     BaseDataValueVector.load(getFieldInnerVectors(), ownBuffers);
>   }
> {code}
> The value count of the resulting ListVector is thus inferred from the size of the offsets buffer. In the case of a length-7 vector in C++, the size of the offsets buffer is exactly 64 bytes (padding for SIMD) -- Java infers from 64 bytes that the value count is 15 (64 / 4 - 1), and the integration test fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)