You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "EdColeman (via GitHub)" <gi...@apache.org> on 2023/08/24 18:25:56 UTC

[GitHub] [accumulo] EdColeman opened a new issue, #3722: Allow pre-allocated unique name batch to be customizable

EdColeman opened a new issue, #3722:
URL: https://github.com/apache/accumulo/issues/3722

   The unique name allocator may be causing unnecessary ZooKeeper contention when large numbers of unique names are needed.  The allocator uses a fixed batch size of 100 + some jitter.  Allowing the allocation block to be changed could reduce ZooKeeper calls by allowing larger name blocks to be allocated with a single call.
   
   Some challenges:
   
   - The allocator is used in multiple services and one block size fits all may not be appropriate. It may be that major compactions and bulk imports would benefit most from a larger name block?
   - If the value stored in ZooKeeper to allow changes, what prefex should be used because the allocator does not fit within a single service. 
   
   The need for this may be reduced with PR #3720 - but there may be additional benefits with allowing a larger block size in some instances. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] jschmidt10 commented on issue #3722: Allow pre-allocated unique name batch to be customizable

Posted by "jschmidt10 (via GitHub)" <gi...@apache.org>.
jschmidt10 commented on issue #3722:
URL: https://github.com/apache/accumulo/issues/3722#issuecomment-1696015302

   Here's an initial PR that makes it configurable. It does not hit the granularity that Ed mentioned above. This would be a global allocation configuration.
   
   https://github.com/apache/accumulo/pull/3729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dtspence commented on issue #3722: Allow pre-allocated unique name batch to be customizable

Posted by "dtspence (via GitHub)" <gi...@apache.org>.
dtspence commented on issue #3722:
URL: https://github.com/apache/accumulo/issues/3722#issuecomment-1693531516

   Would the `UniqueNameAllocator` be a good candidate for representation as a plug-in interface?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dtspence commented on issue #3722: Allow pre-allocated unique name batch to be customizable

Posted by "dtspence (via GitHub)" <gi...@apache.org>.
dtspence commented on issue #3722:
URL: https://github.com/apache/accumulo/issues/3722#issuecomment-1693885903

   > > Would the `UniqueNameAllocator` be a good candidate for representation as a plug-in interface?
   > 
   > At first glance it seems like maybe it could, only because there may be other better ways to do it. However, I don't really see any value in users deciding on the file naming for Accumulo's internally generated data files. There's a risk of the names being not unique, which could severely break things. There's also the risk of them breaking our file naming conventions, which some internal code uses to clean up (.rf.tmp files), or assumptions about whether a file was added as part of a bulk import vs. a major compaction, due to the first letter prefix, etc.
   > 
   > We could really go overboard, by making a pluggable interface for every little operation inside Accumulo. I'd be reluctant to making this one pluggable. There's too much that could go wrong, and I don't think it necessarily makes sense to pass this on to users.
   > 
   > Is there a particular alternative implementation you had in mind? Or were you just thinking about tuning the existing mechanism more?
   
   There was a discussion about implementations (some may be tuning related) (e.g. partitioning lookups at global or tablet server level). We do not have any specific implementations though. In context, we had discussed different implementation trade-offs and if having the method as pluggable could help with some of the challenges mentioned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #3722: Allow pre-allocated unique name batch to be customizable

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on issue #3722:
URL: https://github.com/apache/accumulo/issues/3722#issuecomment-1693608068

   > Would the `UniqueNameAllocator` be a good candidate for representation as a plug-in interface?
   
   At first glance it seems like maybe it could, only because there may be other better ways to do it. However, I don't really see any value in users deciding on the file naming for Accumulo's internally generated data files. There's a risk of the names being not unique, which could severely break things. There's also the risk of them breaking our file naming conventions, which some internal code uses to clean up (.rf.tmp files), or assumptions about whether a file was added as part of a bulk import vs. a major compaction, due to the first letter prefix, etc.
   
   We could really go overboard, by making a pluggable interface for every little operation inside Accumulo. I'd be reluctant to making this one pluggable. There's too much that could go wrong, and I don't think it necessarily makes sense to pass this on to users.
   
   Is there a particular alternative implementation you had in mind? Or were you just thinking about tuning the existing mechanism more?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii closed issue #3722: Allow pre-allocated unique name batch to be customizable

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii closed issue #3722: Allow pre-allocated unique name batch to be customizable
URL: https://github.com/apache/accumulo/issues/3722


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org