You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Shawn Heisey <ap...@elyograg.org> on 2015/10/02 02:11:10 UTC

Prevent a znode from exceeding jute.maxbuffer

I was going to open an issue in Jira for this, but I figured I should
discuss it here before I do that, to make sure that's a reasonable
course of action.

I was thinking about a problem that we encounter with SolrCloud, where
our overseer queue (stored in zookeeper) will greatly exceed the default
jute.maxbuffer size.  I encountered this personally while researching
something for a Solr issue:

https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347834

It seems silly that a znode could get to 14 times the allowed size
without notifying the code *inserting* the data.  The structure of our
queue is such that entries in the queue are children of the znode.  This
means that the data stored directly in the znode is not the problem
(which is pretty much nonexistent in this case), it's the number of
children.

It seems like it would be a good idea to reject the creation of new
children if that would cause the znode size to exceed jute.maxbuffer. 
This moves the required error handling to the code that *updates* ZK,
rather than the code that is watching and/or reading ZK, which seems
more appropriate to me.

Alternately, the mechanisms involved could be changed so that the client
can handle accessing a znode with millions of children, without
complaining about the packet length.

Thoughts?

Thanks,
Shawn

Re: Prevent a znode from exceeding jute.maxbuffer

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/1/2015 6:35 PM, Edward Ribeiro wrote:
> I agree with you, and I think
> https://issues.apache.org/jira/browse/ZOOKEEPER-2260 comes close to the
> second approach you suggested. wdyt?

Interesting!  That could be helpful.

I think it would require changes to the user application code, to handle
the pagination.  If that could be avoided, it would be better, but I'm
not sure that it can be avoided.

If changes to user code are required, I think I like the idea of
rejecting new child creation more -- user code changes will be about
properly handling exceptions at update time instead of modifying the
consuming code to paginate.

A feature to reject child creation should probably be a mode of
operation that can be enabled, but will not be turned on by default. 
Down the road, after the real-world impact of that option has been
determined, the question of whether to turn it on by default can be
reviewed, and perhaps delayed until the next major release (4.0).

A corollary idea -- allow configurable thresholds (percentages of
jute.maxbuffer, maybe) which will slow down the creation of new
children, with the amount of pause increasing as the size of the znode
increases .. and ultimately reject the creation if the buffer size would
actually be exceeded.  I have mixed feelings about this idea.

Thanks,
Shawn

Re: Prevent a znode from exceeding jute.maxbuffer

Posted by Edward Ribeiro <ed...@gmail.com>.

Hi Shawn,

I agree with you, and I think
https://issues.apache.org/jira/browse/ZOOKEEPER-2260 comes close to the
second approach you suggested. wdyt?

Cheers,
Edward
Em 01/10/2015 21:11, "Shawn Heisey" <ap...@elyograg.org> escreveu:

> I was going to open an issue in Jira for this, but I figured I should
> discuss it here before I do that, to make sure that's a reasonable
> course of action.
>
> I was thinking about a problem that we encounter with SolrCloud, where
> our overseer queue (stored in zookeeper) will greatly exceed the default
> jute.maxbuffer size.  I encountered this personally while researching
> something for a Solr issue:
>
>
> https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347834
>
> It seems silly that a znode could get to 14 times the allowed size
> without notifying the code *inserting* the data.  The structure of our
> queue is such that entries in the queue are children of the znode.  This
> means that the data stored directly in the znode is not the problem
> (which is pretty much nonexistent in this case), it's the number of
> children.
>
> It seems like it would be a good idea to reject the creation of new
> children if that would cause the znode size to exceed jute.maxbuffer.
> This moves the required error handling to the code that *updates* ZK,
> rather than the code that is watching and/or reading ZK, which seems
> more appropriate to me.
>
> Alternately, the mechanisms involved could be changed so that the client
> can handle accessing a znode with millions of children, without
> complaining about the packet length.
>
> Thoughts?
>
> Thanks,
> Shawn
>
>