You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Rong Ma (Jira)" <ji...@apache.org> on 2020/05/14 16:45:00 UTC

[jira] [Created] (ARROW-8803) [Java] Row count should be set before loading buffers In VectorLoader

Rong Ma created ARROW-8803:
------------------------------

             Summary: [Java] Row count should be set before loading buffers In VectorLoader
                 Key: ARROW-8803
                 URL: https://issues.apache.org/jira/browse/ARROW-8803
             Project: Apache Arrow
          Issue Type: Bug
          Components: Java
            Reporter: Rong Ma
             Fix For: 1.0.0


Hi guys! I'm new to the community, and I've been using Arrow for some time. In my use case, I need to read RecordBatch with *compressed* underlying buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's "load" method. In this method,
{quote}{{root.setRowCount(recordBatch.getLength());}}
{quote}
It not only set the rowCount for the root, but also set the valueCount for the vectors the root holds, *which have already been set once when load buffers.*

It's not a bug... I know. But if I try to load some compressed buffers, I will get the following exceptions:
{quote}java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: range(0, 504))
 at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718)
 at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965)
 at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439)
 at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708)
 at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226)
 at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61)
 at org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205)
 at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122)
{quote}
And I start to think that if it would be more make sense to call root.setRowCount before loadbuffers?
In root.setRowCount it also calls each vector's setValueCount, which I think is unnecessary here since the vectors after calling loadbuffers are already formed.

Another existing piece of code upstream is similar to this change. [link|https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)