You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/15 07:48:39 UTC

[GitHub] [iceberg] kbendick commented on issue #2692: [Spark] NullPointerException error when attempting to do vectorized read of Parquet file with unsupported encoding

kbendick commented on issue #2692:
URL: https://github.com/apache/iceberg/issues/2692#issuecomment-861269643


   As a starting point, for the Spark vectorized parquet reader, I think we should explicitly throw when we either
   - (1) encounter an encoding that’s not supported
   - (2) explicitly throw when we encounter a Parquet v2 file at read time.
   
   
   I think that approach 2 would potentially be simpler and more in line with the code from Spark, which has an explicit V1 path and V2 path for data pages, footers, etc (and which we modeled this class on).
   
   Spark 3.1.1 afaik does not support vectorized reading of files written with parquet v2 format, though it seems to be in the works.
   
   A more helpful error message would go a long way until we’ve updated the code to support vectorized reading of both parquet v1 write format and parquet v2 format in the Spark vectorized parquet reader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org