You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "jtmzheng (via GitHub)" <gi...@apache.org> on 2023/04/03 06:50:46 UTC

[GitHub] [hudi] jtmzheng opened a new issue, #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

jtmzheng opened a new issue, #8362:
URL: https://github.com/apache/hudi/issues/8362

   I'm trying to understand OCC and how table services (especially cleaning, but also compaction and clustering) should/need to be deployed with multiple writers to a single table. The RFC (https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers under "Scheduling Table Management Services
   ") implies that a lock is acquired throughout scheduling table service operations - is that how it is implemented? That implies to me  cleaning, compaction, clustering, etc *can* be enabled for multiple writers to the same table, but its not clear from the docs whether this is the case. 
   
   eg. can `hoodie.clean.automatic` be enabled for all writers when there are multiple writers?
   
   The docs (https://hudi.apache.org/docs/concurrency_control/#enabling-multi-writing)  note that `hoodie.cleaner.policy.failed.writes = LAZY` must be set:
   
   > Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes eagerly before every writer starts (only supported for single writer) or lazily by the cleaner (required for multi-writers)
   
   What does this mean? Does this mean the cleaner can only be enabled for a single writer (or run independently)?
   
   Thanks!
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jtmzheng closed issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

Posted by "jtmzheng (via GitHub)" <gi...@apache.org>.
jtmzheng closed issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers
URL: https://github.com/apache/hudi/issues/8362


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8362:
URL: https://github.com/apache/hudi/issues/8362#issuecomment-1522883618

   @jtmzheng  yes we can do that. Were you able to use it and test it out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jtmzheng commented on issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

Posted by "jtmzheng (via GitHub)" <gi...@apache.org>.
jtmzheng commented on issue #8362:
URL: https://github.com/apache/hudi/issues/8362#issuecomment-1527065290

   Yep, seems fine so far, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on issue #8362:
URL: https://github.com/apache/hudi/issues/8362#issuecomment-1505028548

   > a lock is acquired throughout scheduling table service operations - is that how it is implemented?
   
   Yes, that is correct. Cleaner can be enabled with multi writers as well but cleaner policy has to be LAZY in that case so that cleaner is able to reconcile any conflicts before actual cleaning up the older files. With single writer, cleaning can be EAGER given that there will be no other writer to conflict with. The cleaner will depend on timeline and not on the writer. So, what really matters is the number of commits or file versions that you want to retain irrespective of whether that threshold is hit by writer 1 or writer 2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] jtmzheng commented on issue #8362: [SUPPORT] Table services with optimistic concurrency control and multiple writers

Posted by "jtmzheng (via GitHub)" <gi...@apache.org>.
jtmzheng commented on issue #8362:
URL: https://github.com/apache/hudi/issues/8362#issuecomment-1505792161

   Thanks, sounds like I can set `hoodie.clean.automatic`: true for all writers as long as OCC is configured correctly?
   
   (Good to close this out if that's the case)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org