You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/07 21:31:47 UTC

[GitHub] [incubator-pinot] awishnick opened a new issue #6894: Confusing errors when trying to ingest Parquet files compressed with ZSTD

awishnick opened a new issue #6894:
URL: https://github.com/apache/incubator-pinot/issues/6894


   I tried to run a SegmentCreation job to ingest some Parquet files written by Trino. I got some confusing error messages that made it look like the file was corrupted. It turns out that this is because the files were compressed with ZSTD (which is suggested by Trino). Desired behavior would be to detect and support ZSTD compression, or at least to error out saying to use a supported compression algorithm.
   
   Examples of the error:
   ```
   java.io.IOException: Could not read footer: java.io.IOException: Could not read footer for file DeprecatedRawLocalFileStatus{path=file:/tmp/pinot-7dd1e9e9-b1bd-416c-ab
   4b-1e66a887d7ca/input/20210507_201917_26196_rv8nu_14b41b59-66e3-4f97-9df0-56c76d859102; isDirectory=false; length=3804; replication=1; blocksize=33554432; modification_time=1620419283015; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}                                                                             
           at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-7ac8650777d6b25c8cae4ca1bd5460f25488a694]     
   
   ...
   
   Caused by: java.io.IOException: can not read class org.apache.parquet.format.FileMetaData: Required field 'codec' was not present! Struct: ColumnMetaData(type:INT32, e
   ncodings:[BIT_PACKED, PLAIN_DICTIONARY, RLE], path_in_schema:[date], codec:null, num_values:2, total_uncompressed_size:54, total_compressed_size:72, data_page_offset:4
   , statistics:Statistics(max:7F 62 34 01, min:7F 62 34 01, null_count:0), encoding_stats:[PageEncodingStats(page_type:DICTIONARY_PAGE, encoding:PLAIN_DICTIONARY, count:
   1), PageEncodingStats(page_type:DATA_PAGE, encoding:PLAIN_DICTIONARY, count:1)])            
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] xiangfu0 commented on issue #6894: Confusing errors when trying to ingest Parquet files compressed with ZSTD

Posted by GitBox <gi...@apache.org>.
xiangfu0 commented on issue #6894:
URL: https://github.com/apache/incubator-pinot/issues/6894#issuecomment-834822896


   Have you tried using `org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader` as record reader? It's using native parquet reader. Meanwhile we can look at how to detect and support ZSTD compression.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org