You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/04 06:55:20 UTC

[GitHub] [hudi] karan867 commented on issue #3077: [SUPPORT] Large latencies in hudi writes using upsert mode.

karan867 commented on issue #3077:
URL: https://github.com/apache/hudi/issues/3077#issuecomment-892413100


   @nsivabalan Thank you for the reply. 
   Please find the answers to the above questions
   
   1. I am using the default file size. I tried decreasing it to 60 MB but that increased the write time by 2-4 mins. 
   2. Avg record size is 1 KB to 2 KB. 
   3. I did not specify this for most of my write experiments. The default value is 500000.  In some of the experiments, I tried reducing it to 50000. This did not have a significant effect on the write time. 
   4. I have a timestamp field in my record key and tried to have it as the prefix of the record key. It takes 2.2 mins more in the "prune by ranges" stage so I turned it off. I guess it is because some of the packets may arrive late to our system and the timestamp is of the packet instead of the timestamp when it arrived. 
   5. The data contains negligible updates (99> % inserts and <1% updates). 
   6. Yes small size creates new files for inserts. It decreases the write time in the first few commits and then it takes more time as the commits increase. 
   
   I have experimented with a lot of things but can't seem to get the time of a single commit writing 100k rows to less than 10 mins. The weird part is if I write 20-30 commits there are 2-3 commits which take really less time (1-4 mins).    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org