You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/15 03:02:05 UTC

[GitHub] [arrow] rongma1997 opened a new pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

rongma1997 opened a new pull request #7188:
URL: https://github.com/apache/arrow/pull/7188


   In my use case, I need to read RecordBatch with **compressed** underlying buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's "load" method. In this method,
   
       root.setRowCount(recordBatch.getLength());
   
   It not only set the rowCount for the root, but also set the valueCount for the vectors the root holds, **which have already been set once when load buffers**.
   
   It's not a bug... I know. But if I try to load some compressed buffers, I will get the following exceptions:
   
       java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: range(0, 504))
       at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718)
       at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965)
       at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439)
       at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708)
       at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226)
       at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61)
       at org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205)
       at org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122)
   
   And I start to think that if it would be more make sense to call root.setRowCount before loadbuffers?
   In root.setRowCount it also calls each vector's setValueCount, which I think is unnecessary here since the vectors after calling loadbuffers are already formed.
   
   Another existing piece of code upstream is similar to this change. [JsonFileReader.java#L170-L178](https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rongma1997 commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
rongma1997 commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-629023063


   > Thanks for the PR @rongma1997.
   > 
   > I'm not clear if the failure in ORC is caused by this change. could you try rebasing?
   > 
   > I'm also not sure how I feel about moving the setRowCount around without a complete implementation for compressed buffers (i.e. I don't know if something else will need to change). At least a unit test showing why this is needed would be worthwhile.
   
   Thanks for the comment!
   I tried rebasing first but it looks like those errors are not caused by this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-629032828


   No it looks like a broke some things merging PRs


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-628999094


   https://issues.apache.org/jira/browse/ARROW-8803


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rongma1997 commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
rongma1997 commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-629175106


   Close this based on the discussions in jira.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rongma1997 closed pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
rongma1997 closed pull request #7188:
URL: https://github.com/apache/arrow/pull/7188


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-629010817


   Thanks for the PR @rongma1997.
   
   I'm not clear if the failure in ORC is caused by this change.  could you try rebasing?  
   
   I'm also not sure how I feel about moving the setRowCount around without a complete implementation for compressed buffers (i.e. I don't know if something else will need to change).  At least a unit test showing why this is needed would be worthwhile.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7188: ARROW-8803: [Java] Row count should be set before loading buffers in VectorLoader

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7188:
URL: https://github.com/apache/arrow/pull/7188#issuecomment-629047852


   master should be fixed now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org