You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/20 06:56:24 UTC

[GitHub] [hudi] bvaradar commented on issue #1728: Processing time gradually increases while using spark structured streaming

bvaradar commented on issue #1728:
URL: https://github.com/apache/hudi/issues/1728#issuecomment-660841314


   (copied the comment from https://github.com/apache/hudi/issues/1830#issuecomment-660840191)
   
   We spent time over the weekend setting up a local test bed with kafka and structured streaming to reproduce this behavior. Here are the steps I followed with code : https://gist.github.com/bvaradar/d892c6c6a69664463f8601d09c187271
   
   I ran the setup overnight for many hours with both MOR and COW tables but was not able to reproduce the gradual increase in time. I did see variance in processing time depending upon the incoming workload because of index lookup and parquet writing but there was no increase in processing time.
   
   We should try to run this in S3 environment because we suspect this is seen in S3 environment alone. If possible, Would you be interested in taking the above gist and run it in your setup ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org