You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/15 22:59:38 UTC

[GitHub] [hudi] umehrot2 commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

umehrot2 commented on issue #1830:
URL: https://github.com/apache/hudi/issues/1830#issuecomment-659057299


   @bvaradar thank you for taking a look at this. We had an internal meeting with @srsteinmetz and the team, and yes at the outset to me it looks the the total time for lookup is increasing linearly here. It seems to be that when it does `countByKey()` in `WorkloadProfie` that is also triggering some of the previous `index lookup` spark actions on the `taggedRecords RDD`. Could this be an artifact of number of parquet files/bloom filters to check keeps increasing over time ? Have we seen similar issues reported before with Hudi ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org