You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/25 17:25:51 UTC

[GitHub] [iceberg] samarthjain commented on pull request #2740: [Parquet] Throw Better Exception with Vectorized Parquet V2 Format

samarthjain commented on pull request #2740:
URL: https://github.com/apache/iceberg/pull/2740#issuecomment-868719981


   @RussellSpitzer - I am hoping we can find a better solution here. I am generally not a fan of catching NPEs :) 
   
   There are a few other approaches possible here:
   1) Parquet v2 actually isn't that well tested. The later versions of Trino though have started writing parquet files in V2 format. We encountered this issue in Iceberg vectorized reads when we upgraded our Presto clusters to trino 350 release. We worked around the issue by reintroducing the older parquet write path in Trino that writes Parquet V1 files.
    
   2) To fix this in Iceberg 
    - We should either look into supporting vectorized reads for v2
    - We should disable vectorized reads when/if we can detect that the parquet files are in V2 format.
    
   I can take up looking into 2). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org