You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/28 16:28:50 UTC

[GitHub] [hudi] codope commented on pull request #3142: [HUDI-1483] Support async clustering for deltastreamer and Spark streaming

codope commented on pull request #3142:
URL: https://github.com/apache/hudi/pull/3142#issuecomment-869829116


   @nsivabalan Thanks for reviewing the PR. I agree with your source code comments. There is scope for reusability. I will address them and update the PR. For the high level questions, my response is as below.
    
   > * Now we have both clustering and compaction, I see that you have added clustering related code just after compaction where ever applicable. Is the higher priority for compaction intentional? or should we have clustering followed by compaction? or does it not matter at all.
   
   In case when both clustering and compaction are enabled then compaction will run just before clustering. The intention is that since currently compaction and clustering cannot run at the same time on the same file groups and clustering could take significant time, so let compaction thread start first. When clustering is scheduled for the filegroups under compaction it would be ignored and picked up in the subsequent run after compaction completes.
   
   > * I came across a class named SchedulerConfGenerator. Don't we need to make any changes here for async clustering?
   
   We will need to make changes here if we create separate job pool for clustering and assign weights for different jobs. Unlike compaction, I did not feel the need for a separate job pool for clustering. By default, each pool gets equal share of resource but within each pool, jobs run in FIFO order. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org