You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Larkin Lowrey <ll...@gmail.com> on 2016/03/09 23:35:13 UTC
Connect bug in 0.9.0.1 client
There is a bug in the 0.9.0.1 client which causes consumers to get stuck
waiting for a connection to be ready to complete.
The root cause is in the connect(...) method of
clients/src/main/java/org/apache/kafka/common/network/Selector.java
Here's the trouble item:
try {
socketChannel.connect(address);
} catch (UnresolvedAddressException e) {
The assumption is that socketChannel.connect(address) always returns
false when in non-blocking mode. A good assumption... but, sadly, wrong.
When spinning up several dozen consumers at the same time we see a small
number (one or two) where socketChannel.connect(...) returns true. When
that happens the connection is valid and SelectionKey.OP_CONNECT will
never be triggered. The poll(long timeout) method in the same class will
wait for the channel to become ready with key.isConnectable() but that
will never happen since the channel is already fully connected before
the select is called.
I implemented a sloppy fix which was able to demonstrate that addressing
this case solves my stuck consumer problem.
How do I submit a bug report for this issue, or does this email
constitute a bug report?
--Larkin
Re: Connect bug in 0.9.0.1 client
Posted by Ismael Juma <is...@juma.me.uk>.
Well spotted Larkin. Please file an issue as we definitely want to fix this
before the next release.
Ismael
On Wed, Mar 9, 2016 at 10:46 PM, Christian Posta <ch...@gmail.com>
wrote:
> Open a JIRA here: https://issues.apache.org/jira/browse/KAFKA
> and open a github.com pull request here: https://github.com/apache/kafka
>
> May wish to peak at this too:
> https://github.com/apache/kafka/blob/trunk/CONTRIBUTING.md
>
> I think you need an apache ICLA too
> https://www.apache.org/licenses/icla.txt
>
> HTH
>
> On Wed, Mar 9, 2016 at 3:35 PM, Larkin Lowrey <ll...@gmail.com> wrote:
>
> > There is a bug in the 0.9.0.1 client which causes consumers to get stuck
> > waiting for a connection to be ready to complete.
> >
> > The root cause is in the connect(...) method of
> >
> > clients/src/main/java/org/apache/kafka/common/network/Selector.java
> >
> > Here's the trouble item:
> >
> > try {
> > socketChannel.connect(address);
> > } catch (UnresolvedAddressException e) {
> >
> > The assumption is that socketChannel.connect(address) always returns
> false
> > when in non-blocking mode. A good assumption... but, sadly, wrong.
> >
> > When spinning up several dozen consumers at the same time we see a small
> > number (one or two) where socketChannel.connect(...) returns true. When
> > that happens the connection is valid and SelectionKey.OP_CONNECT will
> never
> > be triggered. The poll(long timeout) method in the same class will wait
> for
> > the channel to become ready with key.isConnectable() but that will never
> > happen since the channel is already fully connected before the select is
> > called.
> >
> > I implemented a sloppy fix which was able to demonstrate that addressing
> > this case solves my stuck consumer problem.
> >
> > How do I submit a bug report for this issue, or does this email
> constitute
> > a bug report?
> >
> > --Larkin
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>
Re: Connect bug in 0.9.0.1 client
Posted by Christian Posta <ch...@gmail.com>.
Open a JIRA here: https://issues.apache.org/jira/browse/KAFKA
and open a github.com pull request here: https://github.com/apache/kafka
May wish to peak at this too:
https://github.com/apache/kafka/blob/trunk/CONTRIBUTING.md
I think you need an apache ICLA too https://www.apache.org/licenses/icla.txt
HTH
On Wed, Mar 9, 2016 at 3:35 PM, Larkin Lowrey <ll...@gmail.com> wrote:
> There is a bug in the 0.9.0.1 client which causes consumers to get stuck
> waiting for a connection to be ready to complete.
>
> The root cause is in the connect(...) method of
>
> clients/src/main/java/org/apache/kafka/common/network/Selector.java
>
> Here's the trouble item:
>
> try {
> socketChannel.connect(address);
> } catch (UnresolvedAddressException e) {
>
> The assumption is that socketChannel.connect(address) always returns false
> when in non-blocking mode. A good assumption... but, sadly, wrong.
>
> When spinning up several dozen consumers at the same time we see a small
> number (one or two) where socketChannel.connect(...) returns true. When
> that happens the connection is valid and SelectionKey.OP_CONNECT will never
> be triggered. The poll(long timeout) method in the same class will wait for
> the channel to become ready with key.isConnectable() but that will never
> happen since the channel is already fully connected before the select is
> called.
>
> I implemented a sloppy fix which was able to demonstrate that addressing
> this case solves my stuck consumer problem.
>
> How do I submit a bug report for this issue, or does this email constitute
> a bug report?
>
> --Larkin
>
--
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io