You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/14 00:16:28 UTC

[GitHub] [hudi] conanxjp commented on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

conanxjp commented on issue #3324:
URL: https://github.com/apache/hudi/issues/3324#issuecomment-918685414


   @nsivabalan Sorry for the delay, here is some updates.
   
   The weird behavior I reported maybe caused by an amazon version of spark, but depends on the versions, it sometimes can be triggered by a combination of hudi and amazon spark.
   
   To the MOR table, I did give it a try, as well as the clustering feature. The compaction for MOR doesn't have a good use case for our streaming app as the app is doing deduplication on the run and every records delivered will not be modified, not by the streaming app itself. We do have batch external modifications jobs that ran occasionally, but we have the requirements to not interrupt the running streaming app as the most fresh data is always used. With the hudi commits, it seems we can't run parallel hudi jobs committing to the same location even though the streaming app and the external modification jobs are touching different partitions of the table. Not sure whether there is any way that we can achieve this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org