You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/13 18:09:55 UTC
[GitHub] [iceberg] asheeshgarg commented on issue #6415: Vectorized Read Issue
asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1349350170
@nastra filled in the missing bits
So this schema that is define in Iceberg entity_status is UTF8
Schema<entity_id: Utf8, entity_name: Utf8, entity_status: Utf8,
This is what is been generated by batchRoot Schema
Schema<entity_id: Utf8, entity_name: Utf8, entity_status: Int(32, true),
There are other fields also coming as Int which are UTF8 it throws a error like below because of type mismatch. SPark Able to read the data fine
java.lang.IndexOutOfBoundsException: index: 0, length: 8388608 (expected: range(0, 15888))
at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:955)
at org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:451)
at org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:732)
at org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:240)
at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:86)
Expected behavior batches should return the data of schema type defined in iceberg/hive.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org