You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/16 06:05:25 UTC

[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

KarthickAN commented on issue #2178:
URL: https://github.com/apache/hudi/issues/2178#issuecomment-709817321


   @nsivabalan Please find below my answers
   
   1. That's the average record size. I inspected the parquet files produced and calculated that based on the metrics I found there.
   2. Yes
   3. hoodie.copyonwrite.insert.split.size - didn't set manually. By default its enabled. But we don't retain 24 commits its just 1.
   hoodie.index.bloom.num_entries = set to 1500000
   hoodie.index.bloom.fpp = didn't set manually. default is 0.000000001
   hoodie.bloom.index.filter.type = didn't set manually. default is BLOOM
   
   in fact except for the configs I mentioned in the issue description I didn't set any other config explicitly and I left it all as defaults.
   
   4. All the files I inspected so far had this issue regardless of the size. This is consistent.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org