You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Larkin Lowrey <ll...@gmail.com> on 2016/03/09 23:35:13 UTC

Connect bug in 0.9.0.1 client

There is a bug in the 0.9.0.1 client which causes consumers to get stuck 
waiting for a connection to be ready to complete.

The root cause is in the connect(...) method of

clients/src/main/java/org/apache/kafka/common/network/Selector.java

Here's the trouble item:

         try {
             socketChannel.connect(address);
         } catch (UnresolvedAddressException e) {

The assumption is that socketChannel.connect(address) always returns 
false when in non-blocking mode. A good assumption... but, sadly, wrong.

When spinning up several dozen consumers at the same time we see a small 
number (one or two) where socketChannel.connect(...) returns true. When 
that happens the connection is valid and SelectionKey.OP_CONNECT will 
never be triggered. The poll(long timeout) method in the same class will 
wait for the channel to become ready with key.isConnectable() but that 
will never happen since the channel is already fully connected before 
the select is called.

I implemented a sloppy fix which was able to demonstrate that addressing 
this case solves my stuck consumer problem.

How do I submit a bug report for this issue, or does this email 
constitute a bug report?

--Larkin

Re: Connect bug in 0.9.0.1 client

Posted by Ismael Juma <is...@juma.me.uk>.
Well spotted Larkin. Please file an issue as we definitely want to fix this
before the next release.

Ismael

On Wed, Mar 9, 2016 at 10:46 PM, Christian Posta <ch...@gmail.com>
wrote:

> Open a JIRA here: https://issues.apache.org/jira/browse/KAFKA
> and open a github.com pull request here: https://github.com/apache/kafka
>
> May wish to peak at this too:
> https://github.com/apache/kafka/blob/trunk/CONTRIBUTING.md
>
> I think you need an apache ICLA too
> https://www.apache.org/licenses/icla.txt
>
> HTH
>
> On Wed, Mar 9, 2016 at 3:35 PM, Larkin Lowrey <ll...@gmail.com> wrote:
>
> > There is a bug in the 0.9.0.1 client which causes consumers to get stuck
> > waiting for a connection to be ready to complete.
> >
> > The root cause is in the connect(...) method of
> >
> > clients/src/main/java/org/apache/kafka/common/network/Selector.java
> >
> > Here's the trouble item:
> >
> >         try {
> >             socketChannel.connect(address);
> >         } catch (UnresolvedAddressException e) {
> >
> > The assumption is that socketChannel.connect(address) always returns
> false
> > when in non-blocking mode. A good assumption... but, sadly, wrong.
> >
> > When spinning up several dozen consumers at the same time we see a small
> > number (one or two) where socketChannel.connect(...) returns true. When
> > that happens the connection is valid and SelectionKey.OP_CONNECT will
> never
> > be triggered. The poll(long timeout) method in the same class will wait
> for
> > the channel to become ready with key.isConnectable() but that will never
> > happen since the channel is already fully connected before the select is
> > called.
> >
> > I implemented a sloppy fix which was able to demonstrate that addressing
> > this case solves my stuck consumer problem.
> >
> > How do I submit a bug report for this issue, or does this email
> constitute
> > a bug report?
> >
> > --Larkin
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>

Re: Connect bug in 0.9.0.1 client

Posted by Christian Posta <ch...@gmail.com>.
Open a JIRA here: https://issues.apache.org/jira/browse/KAFKA
and open a github.com pull request here: https://github.com/apache/kafka

May wish to peak at this too:
https://github.com/apache/kafka/blob/trunk/CONTRIBUTING.md

I think you need an apache ICLA too https://www.apache.org/licenses/icla.txt

HTH

On Wed, Mar 9, 2016 at 3:35 PM, Larkin Lowrey <ll...@gmail.com> wrote:

> There is a bug in the 0.9.0.1 client which causes consumers to get stuck
> waiting for a connection to be ready to complete.
>
> The root cause is in the connect(...) method of
>
> clients/src/main/java/org/apache/kafka/common/network/Selector.java
>
> Here's the trouble item:
>
>         try {
>             socketChannel.connect(address);
>         } catch (UnresolvedAddressException e) {
>
> The assumption is that socketChannel.connect(address) always returns false
> when in non-blocking mode. A good assumption... but, sadly, wrong.
>
> When spinning up several dozen consumers at the same time we see a small
> number (one or two) where socketChannel.connect(...) returns true. When
> that happens the connection is valid and SelectionKey.OP_CONNECT will never
> be triggered. The poll(long timeout) method in the same class will wait for
> the channel to become ready with key.isConnectable() but that will never
> happen since the channel is already fully connected before the select is
> called.
>
> I implemented a sloppy fix which was able to demonstrate that addressing
> this case solves my stuck consumer problem.
>
> How do I submit a bug report for this issue, or does this email constitute
> a bug report?
>
> --Larkin
>



-- 
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io