You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/10/08 16:48:16 UTC

[GitHub] [incubator-iceberg] anjalinorwood opened a new issue #520: Support row based, vectorized and hybrid reads of Iceberg data

anjalinorwood opened a new issue #520: Support row based, vectorized and hybrid reads of Iceberg data
URL: https://github.com/apache/incubator-iceberg/issues/520

It would be desirable to support following three types of reads of Iceberg data and be able to switch among the three methods as necessary.
1) Row iterator based reads of data (as it exists today).
2) Vectorized reads of Iceberg data: Currently, Spark can exploit vectorized reads only for primitive data type. In the first version of vectorized reads, only primitive data types are supported. If a complex/nested data type is detected, Iceberg needs to fall back to row iterator based reads.
3) Vectorized reads of Iceberg table exposed using row iterator API: Spark requires all tasks to use either vectorized or row based reads, but not a mixture of both. If the data is a mixture of Parquet and Avro, Iceberg should read Parquet data using vectorized reads (for performance), use row iterator for Avro data and provide a row iterator shim on top of Arrow buffers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org