You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sydneyhoran (via GitHub)" <gi...@apache.org> on 2023/03/08 19:22:39 UTC

[GitHub] [hudi] sydneyhoran commented on issue #6316: [SUPPORT] Running `--continuous` mode with HoodieMultiTableDeltaStreamer seems to only ingest first table

sydneyhoran commented on issue #6316:
URL: https://github.com/apache/hudi/issues/6316#issuecomment-1460731185

   I added a workaround for this issue in my local fork of Hudi. Small tweaks to [HoodieAsyncService.java](https://github.com/sydneyhoran/hudi/blob/20f182d82e020ecd30fc1546ea0a4a6116276195/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/async/HoodieAsyncService.java#L128) and [HoodieMultiTableDeltaStreamer.java](https://github.com/sydneyhoran/hudi/blob/20f182d82e020ecd30fc1546ea0a4a6116276195/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L405) were required and now it's working as expected. The executor shutdown timeout being 24 hours was causing the job to hang, so I changed it to 10 seconds with no negative consequences. I also enabled the config `--post-write-termination-strategy-class org.apache.hudi.utilities.deltastreamer.NoNewDataTerminationStrategy` in MultiTableDeltaStreamer, so it can move on to the next table in MultiTableDeltaStreamer after N number of retries with no new data (max.rounds.without.new.d
 ata.to.shutdown) instead of being stuck on the first one until there is an error.
   
   It may not fully solve all use case, such as if you aren't expecting the No New Data condition in all the tables in the multi-job. Maybe a new PostWriteTerminationStrategy would be required for some use cases, or a refactor of how the loop functions.
   
   It does not go back to the beginning of the tables and continuously loop over the set of MultiTables, because after the last one has NoNewData the Spark job will end. So I am handling that within the job orchestration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org