You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Rong Ma (Jira)" <ji...@apache.org> on 2020/05/15 03:06:00 UTC

[jira] [Commented] (ARROW-8803) [Java] Row count should be set before loading buffers in VectorLoader

    [ https://issues.apache.org/jira/browse/ARROW-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107881#comment-17107881 ] 

Rong Ma commented on ARROW-8803:
--------------------------------

[~fan_li_ya] Thanks for the comment! In my use case, the input buffers are built in native and compressed by calling c++ buffer compression API.

> [Java] Row count should be set before loading buffers in VectorLoader
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8803
>                 URL: https://issues.apache.org/jira/browse/ARROW-8803
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Rong Ma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi guys! I'm new to the community, and I've been using Arrow for some time. In my use case, I need to read RecordBatch with *compressed* underlying buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's "load" method. In this method,
> {quote}{{root.setRowCount(recordBatch.getLength());}}
> {quote}
> It not only set the rowCount for the root, but also set the valueCount for the vectors the root holds, *which have already been set once when load buffers.*
> It's not a bug... I know. But if I try to load some compressed buffers, I will get the following exceptions:
> {quote}java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: range(0, 504))
>  at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718)
>  at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965)
>  at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439)
>  at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708)
>  at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226)
>  at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61)
>  at org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205)
>  at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122)
> {quote}
> And I start to think that if it would be more make sense to call root.setRowCount before loadbuffers?
> In root.setRowCount it also calls each vector's setValueCount, which I think is unnecessary here since the vectors after calling loadbuffers are already formed.
> Another existing piece of code upstream is similar to this change. [link|https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)