You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/01 05:00:45 UTC

[GitHub] [iceberg] kbendick opened a new issue #1540: Add in support for vectorized reads in spark for parquet metadata columns

kbendick opened a new issue #1540:
URL: https://github.com/apache/iceberg/issues/1540


   Updating parameterized tests to have names, I noticed that `TestSparkParquetReadMetadataColumns` had a `vectorized` parameter that is currently only tested with `false`. It is commented out for `true` and fails for all three of the tests.
   
   As there's been a lot of discussion around large amounts of metadata files becoming a bottleneck, I'm wondering if it makes sense to add in vectorized reading here, specifically for Spark where it seems most likely that people will be running the commands to compact metadata, prune dead files and dead metadata files, etc.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on issue #1540: Add in support for vectorized reads in spark for parquet metadata columns

Posted by GitBox <gi...@apache.org>.

kbendick commented on issue #1540:
URL: https://github.com/apache/iceberg/issues/1540#issuecomment-701901546


   If this is implemented, the parameter group for `vectorized` should be updated to include `true` here: https://github.com/apache/iceberg/blob/44c1d00b37f0d7fc2e978baad4f7a861a8335cf0/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java#L109


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org