You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/04 21:57:53 UTC
[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue
asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1371476608
@nastra @nazq so in the mean time we have merged the pull request and bundle a local jar with the #3024
It work fine for most of the columns but we are getting
java.lang.IndexOutOfBoundsException: index: 32749, length: 32 (expected: range(0, 32768))
at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:765)
at org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1244)
at org.apache.arrow.vector.BaseVariableWidthVector.set(BaseVariableWidthVector.java:1025)
at org.apache.iceberg.arrow.DictEncodedArrowConverter.lambda$toVarCharVector$5(DictEncodedArrowConverter.java:153)
at org.apache.iceberg.arrow.DictEncodedArrowConverter.initVector(DictEncodedArrowConverter.java:201)
at org.apache.iceberg.arrow.DictEncodedArrowConverter.toVarCharVector(DictEncodedArrowConverter.java:150)
at org.apache.iceberg.arrow.DictEncodedArrowConverter.toArrowVector(DictEncodedArrowConverter.java:47)
at org.apache.iceberg.arrow.vectorized.ColumnVector.getArrowVector(ColumnVector.java:66)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
at org.apache.iceberg.arrow.vectorized.ColumnarBatch.createVectorSchemaRootFromVectors(ColumnarBatch.java:58)
at com.ReadIcebergTableTestV3.main(ReadIcebergTableTestV3.java:54)
when reading columns where distinct count is large for the dictionary. Will try to create a test case to replicate it.
@rdblue Arrow also added dataset where we can read the tabular data Arrow Vectors I was able to read the parquet file directly not sure if we like to add read support using the Arrow Dataset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org