You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/10 22:07:08 UTC

[GitHub] [iceberg] shardulm94 commented on pull request #1189: Spark: Support ORC vectorized reads

shardulm94 commented on pull request #1189:
URL: https://github.com/apache/iceberg/pull/1189#issuecomment-656914438


   Which cases are you comparing for nested data? `readIcebergNonVectorized` v/s `readIcebergVectorized` for nested data shows 2-3x improvement which is similar to the improvements for flat data. For nested data the `readFileSourceVectorized` and `readWithProjectionFileSourceVectorized` are not really relevant since the file source defaults to row by row reading for nested data, so I guess we should just remove them. I modified the Spark unit tests to also test the vectorized codepaths, so I am assuming those tests check correctness, but I can do some sanity checks manually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org