You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/04 21:57:53 UTC

[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue

asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1371476608

   @nastra @nazq  so in the mean time we have merged the pull request and bundle a local jar with the #3024 
   It work fine for most of the columns but we are getting 
   java.lang.IndexOutOfBoundsException: index: 32749, length: 32 (expected: range(0, 32768))
           at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
           at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:765)
           at org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1244)
           at org.apache.arrow.vector.BaseVariableWidthVector.set(BaseVariableWidthVector.java:1025)
           at org.apache.iceberg.arrow.DictEncodedArrowConverter.lambda$toVarCharVector$5(DictEncodedArrowConverter.java:153)
           at org.apache.iceberg.arrow.DictEncodedArrowConverter.initVector(DictEncodedArrowConverter.java:201)
           at org.apache.iceberg.arrow.DictEncodedArrowConverter.toVarCharVector(DictEncodedArrowConverter.java:150)
           at org.apache.iceberg.arrow.DictEncodedArrowConverter.toArrowVector(DictEncodedArrowConverter.java:47)
           at org.apache.iceberg.arrow.vectorized.ColumnVector.getArrowVector(ColumnVector.java:66)
           at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
           at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
           at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
           at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
           at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
           at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
           at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
           at org.apache.iceberg.arrow.vectorized.ColumnarBatch.createVectorSchemaRootFromVectors(ColumnarBatch.java:58)
           at com.ReadIcebergTableTestV3.main(ReadIcebergTableTestV3.java:54)
   when reading columns where distinct count is large for the dictionary. Will try to create a test case to replicate it.
   
   @rdblue Arrow also added dataset where we can read the tabular data Arrow Vectors I was able to read the parquet file directly not sure if we like to add read support using the Arrow Dataset.
    
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org