You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/10 09:11:13 UTC

[GitHub] [iceberg] wuwenchi edited a comment on issue #4074: flink: after rewrite, the two small files are rewritten into the same two small files as before

wuwenchi edited a comment on issue #4074:
URL: https://github.com/apache/iceberg/issues/4074#issuecomment-1034541847


   @RussellSpitzer  Thank you for your answer. I incorporated the pr you mentioned, but the phenomenon persists. 
   I think it might be two problems.
   I followed the process and the problem seems to be in the **isPartialFileScan** function.
   This function is fine if using avro format data files.
   However, when using data files in parquet format, since the initial state of the parquet file itself has a 4-byte offset, the judgment here is wrong, even if it is a complete parquet file, it will return true here.
   
   Should a judgment be made on the file format here? If it is in parquet format, we need to add an initial offset of 4 bytes to **fileScanTask.length()**.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org