You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Frank Wong (Jira)" <ji...@apache.org> on 2022/01/25 06:27:00 UTC

[jira] [Commented] (ARROW-15382) SplitAndTransfer throws for (0,0) if vector empty

    [ https://issues.apache.org/jira/browse/ARROW-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481570#comment-17481570 ] 

Frank Wong commented on ARROW-15382:
------------------------------------

The problem seems to affect BaseLargeVariableWidthVector, BaseVariableWidthVector, ListVector and MapVector.

> SplitAndTransfer throws for (0,0) if vector empty
> -------------------------------------------------
>
>                 Key: ARROW-15382
>                 URL: https://issues.apache.org/jira/browse/ARROW-15382
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: David Vogelbacher
>            Assignee: Frank Wong
>            Priority: Major
>
> I've hit a bug where `splitAndTransfer` on vectors throws if the vector is completely empty and the offset buffer is empty.
> An easy repro is:
> {noformat}
>         BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
>         ListVector listVector = ListVector.empty("listVector", allocator);
>         listVector.getTransferPair(listVector.getAllocator()).splitAndTransfer(0, 0);
> {noformat}
> This results in the following stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
> 	at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
> 	at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
> 	at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
> 	at org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:484)
> {noformat}
> In production we hit this when calling {{VectorSchemaRoot.slice}}. The schema root contains a {{ListVector}} with a {{VarCharVector}} value vector. The list vector isn't empty, but all the strings in the var char vector are. {{splitAndTransfer}} on the list vector works, but then when underlying var char vector is split we get the same exception:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
> 	at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:335)
> 	at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:322)
> 	at io.netty.buffer.ArrowBuf.getInt(ArrowBuf.java:441)
> 	at org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferOffsetBuffer(BaseVariableWidthVector.java:728)
> 	at org.apache.arrow.vector.BaseVariableWidthVector.splitAndTransferTo(BaseVariableWidthVector.java:712)
> 	at org.apache.arrow.vector.VarCharVector$TransferImpl.splitAndTransfer(VarCharVector.java:321)
> 	at org.apache.arrow.vector.complex.ListVector$TransferImpl.splitAndTransfer(ListVector.java:496)
> 	at org.apache.arrow.vector.VectorSchemaRoot.lambda$slice$1(VectorSchemaRoot.java:308)
> 	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> 	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
> 	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> 	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> 	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
> 	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> 	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
> 	at org.apache.arrow.vector.VectorSchemaRoot.slice(VectorSchemaRoot.java:310)
> {noformat} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)