You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/20 10:27:14 UTC

[GitHub] [iceberg] pvary opened a new issue #2120: Hive: Vectorization is not working

pvary opened a new issue #2120:
URL: https://github.com/apache/iceberg/issues/2120


   Current Hive reads and writes are working on the non-vectorized path. The vectorization could disabled by different rules but if we force it to be enabled we will have failures.
   
   The current way of working:
   - Read records by `HiveIcebergSerDe.deserialize()` and return a `Record` object where the schema contains only the projected columns
   - Write records by `HiveIcebergSerDe.serialize()` and return a `Record` object where the schema is the schema of the target table
   
   Vectorized code path expects:
   - Read path: List of Objects where the list contains every column of the source table schema (the non-projected columns can/should be null)
   - Write path: List of Objects where the list contains every column of the target table schema
   
   Maybe we should make it possible to create different Iceberg readers/writers for vectorized and non-vectorized code paths. The decision could be made based on the Hive `Utilities.getIsVectorized(conf)` like [this](https://github.com/apache/hive/blob/a97448f84167e4e8c3615908556fe2e4163a43ca/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3893-L3918)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org