You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Andrew Carman <ca...@gmail.com> on 2009/01/21 06:52:17 UTC

SyncRequestProcessor Possible Bug

We were looking through the ZK server code and came across a possible bug
that occurs when shutting down the SyncRequestProcessor (it may also apply
to other RequestProcessors, we didn't investigate).

When shutdown() is called in server/SyncRequestProcessor.java, it adds the
requestOfDeath to the processor's queue and when the run loop reaches the
requestOfDeath in the queue it immediately stops the processor. However, if
the request processor is being bombarded by new requests, it only calls
flush when there are 1000 requests backed up to flush. If the requestOfDeath
is received when the toFlush buffer is partially full, the processor stops
immediately without flushing those remaining requests. These requests are
lost. This causes our test for Zab to break, so we thought it may also be a
problem for ZK as well. Is it?

If so, our proposed fix is:
---
trunk/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java
(revision 736200)
+++
trunk/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java
(working copy)
@@ -77,6 +77,7 @@
                     }
                 }
                 if (si == requestOfDeath) {
+                    flush(toFlush);
                     break;
                 }
                 if (si != null) {


Cheers,
Harvey Mudd Clinic Team

RE: SyncRequestProcessor Possible Bug

Posted by Benjamin Reed <br...@yahoo-inc.com>.

Just to be clear it is not a correctness bug. For that piece of code we just need to make sure that whatever is acked is synced to disk. requests in the toFlush list have not be acked, so we can drop them. we should add a comment to that effect in the code.

It maybe slightly more efficient in terms of recovery if we do a flush before we end, but it doesn't affect correctness. since the other requestProcessors will be shutdown it is very unlikely that any acks are will be sent after the flush.
the bigger question is why your zab test breaks. it is not a correctness problem, so your tests shouldn't fail.

ben
________________________________________
From: Flavio Junqueira [fpj@yahoo-inc.com]
Sent: Wednesday, January 21, 2009 1:35 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: SyncRequestProcessor Possible Bug

Andrew, It sounds right to me that this is a bug. In any case, I'd
like to suggest that you open a jira issue and propose your patch.

-Flavio

On Jan 21, 2009, at 6:52 AM, Andrew Carman wrote:

> We were looking through the ZK server code and came across a
> possible bug
> that occurs when shutting down the SyncRequestProcessor (it may also
> apply
> to other RequestProcessors, we didn't investigate).
>
> When shutdown() is called in server/SyncRequestProcessor.java, it
> adds the
> requestOfDeath to the processor's queue and when the run loop
> reaches the
> requestOfDeath in the queue it immediately stops the processor.
> However, if
> the request processor is being bombarded by new requests, it only
> calls
> flush when there are 1000 requests backed up to flush. If the
> requestOfDeath
> is received when the toFlush buffer is partially full, the processor
> stops
> immediately without flushing those remaining requests. These
> requests are
> lost. This causes our test for Zab to break, so we thought it may
> also be a
> problem for ZK as well. Is it?
>
> If so, our proposed fix is:
> ---
> trunk/src/java/main/org/apache/zookeeper/server/
> SyncRequestProcessor.java
> (revision 736200)
> +++
> trunk/src/java/main/org/apache/zookeeper/server/
> SyncRequestProcessor.java
> (working copy)
> @@ -77,6 +77,7 @@
>                     }
>                 }
>                 if (si == requestOfDeath) {
> +                    flush(toFlush);
>                     break;
>                 }
>                 if (si != null) {
>
>
> Cheers,
> Harvey Mudd Clinic Team

Re: SyncRequestProcessor Possible Bug

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Andrew, It sounds right to me that this is a bug. In any case, I'd  
like to suggest that you open a jira issue and propose your patch.

-Flavio



On Jan 21, 2009, at 6:52 AM, Andrew Carman wrote:

> We were looking through the ZK server code and came across a  
> possible bug
> that occurs when shutting down the SyncRequestProcessor (it may also  
> apply
> to other RequestProcessors, we didn't investigate).
>
> When shutdown() is called in server/SyncRequestProcessor.java, it  
> adds the
> requestOfDeath to the processor's queue and when the run loop  
> reaches the
> requestOfDeath in the queue it immediately stops the processor.  
> However, if
> the request processor is being bombarded by new requests, it only  
> calls
> flush when there are 1000 requests backed up to flush. If the  
> requestOfDeath
> is received when the toFlush buffer is partially full, the processor  
> stops
> immediately without flushing those remaining requests. These  
> requests are
> lost. This causes our test for Zab to break, so we thought it may  
> also be a
> problem for ZK as well. Is it?
>
> If so, our proposed fix is:
> ---
> trunk/src/java/main/org/apache/zookeeper/server/ 
> SyncRequestProcessor.java
> (revision 736200)
> +++
> trunk/src/java/main/org/apache/zookeeper/server/ 
> SyncRequestProcessor.java
> (working copy)
> @@ -77,6 +77,7 @@
>                     }
>                 }
>                 if (si == requestOfDeath) {
> +                    flush(toFlush);
>                     break;
>                 }
>                 if (si != null) {
>
>
> Cheers,
> Harvey Mudd Clinic Team