You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/21 02:00:25 UTC

[GitHub] [hudi] yanenze opened a new issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

yanenze opened a new issue #4855:
URL: https://github.com/apache/hudi/issues/4855


    when i set the write.parquet.max.file.size of FlinkOptions it does not become effective,the file in fact grow bigger than i have cofigured...
   so i think maybe i must set the avgRecordSize ,so i just set zhe avgRecordSize. but the parquet file has also grow bigger than i have configured


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1047201422


   @danny0405 @leesf : can you loop in someone to assist here please. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1048842365


   Thanks for putting up the fix. Appreciate the interest. someone from the community/danny will help w/ reviews and getting ti it to landing. Will close out the github issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanenze commented on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
yanenze commented on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1048506073


   > Yes, the file size control is not accurate, how much size does the actual file size exceed your desired threshold ?
   hello, i configured the max parquet file size 128M but finally it grow to 8G,
   so i try to find the reason.
   i have create a pull request  in #4879  
   i think the big file generate reason is when bucketAssigner find small file list , is lost the file which is  in pendingCompaction, so the total size only caculate the  (log file size * compressratio (0.35)) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1047484493


   Yes, the file size control is not accurate, how much size does the actual file size exceed your desired threshold ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4855:
URL: https://github.com/apache/hudi/issues/4855


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yanenze edited a comment on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
yanenze edited a comment on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1048506073


   > Yes, the file size control is not accurate, how much size does the actual file size exceed your desired threshold ?
   
   hello, i configured the max parquet file size 128M but finally it grow to 8G,
   so i try to find the reason.
   i have create a pull request  in #4879  
   i think the big file generate reason is when bucketAssigner find small file list , is lost the file which is  in pendingCompaction, so the total size only caculate the  (log file size * compressratio (0.35)) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #4855: [SUPPORT] mor table parquet file max file size does not go into effect

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #4855:
URL: https://github.com/apache/hudi/issues/4855#issuecomment-1047201422


   @danny0405 @leesf : can you loop in someone to assist here please. 
   @yanenze : in the mean time, let me know if you are seeing this in spark pipeline. I don't have much exp w/ flink. will let others chime in . 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org