You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Ivan Kelly (JIRA)" <ji...@apache.org> on 2010/11/03 10:14:26 UTC

[jira] Commented: (ZOOKEEPER-916) Problem receiving messages from subscribed channels in c++ client

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927799#action_12927799 ] 

Ivan Kelly commented on ZOOKEEPER-916:
--------------------------------------

The root cause is that startReceiving is called from both startDelivery and the connect callback. This is correct
because there are scenarios where it may be needed in both. The checking of the receiving flag is the correct way to
resolve this. 

However, this double call of startReceiving really happens *every* time you use the c++ client to receive on a
subscription, so it wasn't clear why we've only seen the problem now. 

Turns out that it only triggers when a partial message is received, so we only saw it when the messages were very big.
For example, if the message is 100 bytes, the 100 bytes will arrive to the c++ client at the same time
sizeReadCallbackHandler will read the size out of the buffer and since the whole message is already in the buffer will
call messageReadCallbackHandler directly instead of calling into asio to read more from the buffer. After
messageReadCallbackHandler has finished, there will be 0 bytes in the buffer, or another whole message in the buffer,
so the second invocation of sizeReadCallbackHandler will behave correctly. 

However, if the message is bigger than the MTU, then the first sizeReadCallbackHandler will run, which reads the
message size from the buffer. It then sees that the remaining bytes in the buffer are fewer than the size of the
message, so makes a call to async_read to get more, and the function end. At this point the second
sizeReadCallbackHandler runs, sees junk and craps out. 


> Problem receiving messages from subscribed channels in c++ client 
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-916
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-916
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: contrib-hedwig
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>         Attachments: ZOOKEEPER-916.patch
>
>
> We see this bug with receiving messages from a subscribed channel.  This problem seems to happen with larger messages.  The flow is to first read at least 4 bytes from the socket channel. Extract the first 4 bytes to get the message size.  If we've read enough data into the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel and message size.  If not, then do an async read for at least the remaining amount of bytes in the message from the socket channel.  When done, invoke the messageReadCallbackHandler.
> The problem seems that when the second async read is done, the same sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler.  The result is that we then try to read the first 4 bytes again from the buffer.  This will get a random message size and screw things up.  I'm not sure if it's an incorrect use of the boost asio async_read function or we're doing the boost bind to the callback function incorrectly.
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 512 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, currently in buffer 508 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 599 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18)
> 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 134286765 from channel(0x80b7a18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.