You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Marco <ze...@yahoo.co.uk> on 2014/11/10 15:08:21 UTC

Error in fetch Name. How to recover broken node?

Hi,
i've got a 2-machine kafka cluster. For some reasons after a restart the second node won't start.
i get tons of "Error in fetch Name" until I get a final "Too many open files".

How do i start dealing with this?

thanks

this is the error

[2014-11-10 14:48:01,169] INFO [Kafka Server 2], started (kafka.server.KafkaServer)
[2014-11-10 14:48:01,378] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [news,3],[test,0],[test,2],[news,1],[test3,1],[test3,3] (kafka.server.ReplicaFetcherManager)
[2014-11-10 14:48:01,459] INFO Truncating log news-3 to offset 249. (kafka.log.Log)
[2014-11-10 14:48:01,462] INFO Truncating log test-0 to offset 0. (kafka.log.Log)
[2014-11-10 14:48:01,462] INFO Truncating log test-2 to offset 0. (kafka.log.Log)
[2014-11-10 14:48:01,463] INFO Truncating log news-1 to offset 268. (kafka.log.Log)
[2014-11-10 14:48:01,464] INFO Truncating log test3-1 to offset 0. (kafka.log.Log)
[2014-11-10 14:48:01,464] INFO Truncating log test3-3 to offset 0. (kafka.log.Log)
[2014-11-10 14:48:01,530] INFO [ReplicaFetcherThread-0-1], Starting  (kafka.server.ReplicaFetcherThread)
[2014-11-10 14:48:01,535] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions ArrayBuffer([[news,3], initOffset 249 to broker id:1,host:machine1,port:9092] , [[news,1], initOffset 268 to broker id:1,host:machine1,port:9092] ) (kafka.server.ReplicaFetcherManager)
[2014-11-10 14:48:01,551] ERROR [ReplicaFetcherThread-0-1], Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 0; ClientId: ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [news,3] -> PartitionFetchInfo(249,1048576),[news,1] -> PartitionFetchInfo(268,1048576) (kafka.server.ReplicaFetcherThread)
java.nio.channels.UnresolvedAddressException
        at sun.nio.ch.Net.checkAddress(Net.java:127)
...

Re: Error in fetch Name. How to recover broken node?

Posted by Marco <ze...@yahoo.co.uk>.

Thanks. That worked just fine!


Il Lunedì 10 Novembre 2014 17:53, Guozhang Wang <wa...@gmail.com> ha scritto:
 


You do not need to delete the data folder, I think "file handles" here are mostly due to socket leaks, i.e. network socket file handlers, not disk file handlers. Just restart the broker should do the work.

Guozhang


On Mon, Nov 10, 2014 at 7:47 AM, Marco <ze...@yahoo.co.uk> wrote:

We're using kafka 0.8.1.1.
>
>About network partition, it is an option.
>now i'm just wondering if deleting the data folder on the second node will at least have it come up again.
>
>i think another guy tried a kafka-reassign-partitions just before it all blew up.
>
>
>Il Lunedì 10 Novembre 2014 16:36, Guozhang Wang <wa...@gmail.com> ha scritto:
>
>Hi Marco,
>
>The fetch error comes from "UnresolvedAddressException", could you try to
>check if you have a network partition issue during that time?
>
>As for the "Too many file handlers", I think this is due to not properly
>handling such exceptions that it does not close the socket in time, which
>version of Kafka are you using?
>
>Guozhang
>
>
>
>
>On Mon, Nov 10, 2014 at 6:08 AM, Marco <ze...@yahoo.co.uk> wrote:
>
>> Hi,
>> i've got a 2-machine kafka cluster. For some reasons after a restart the
>> second node won't start.
>> i get tons of "Error in fetch Name" until I get a final "Too many open
>> files".
>>
>> How do i start dealing with this?
>>
>> thanks
>>
>> this is the error
>>
>> [2014-11-10 14:48:01,169] INFO [Kafka Server 2], started
>> (kafka.server.KafkaServer)
>> [2014-11-10 14:48:01,378] INFO [ReplicaFetcherManager on broker 2] Removed
>> fetcher for partitions
>> [news,3],[test,0],[test,2],[news,1],[test3,1],[test3,3]
>> (kafka.server.ReplicaFetcherManager)
>> [2014-11-10 14:48:01,459] INFO Truncating log news-3 to offset 249.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,462] INFO Truncating log test-0 to offset 0.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,462] INFO Truncating log test-2 to offset 0.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,463] INFO Truncating log news-1 to offset 268.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,464] INFO Truncating log test3-1 to offset 0.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,464] INFO Truncating log test3-3 to offset 0.
>> (kafka.log.Log)
>> [2014-11-10 14:48:01,530] INFO [ReplicaFetcherThread-0-1], Starting
>> (kafka.server.ReplicaFetcherThread)
>> [2014-11-10 14:48:01,535] INFO [ReplicaFetcherManager on broker 2] Added
>> fetcher for partitions ArrayBuffer([[news,3], initOffset 249 to broker
>> id:1,host:machine1,port:9092] , [[news,1], initOffset 268 to broker
>> id:1,host:machine1,port:9092] ) (kafka.server.ReplicaFetcherManager)
>> [2014-11-10 14:48:01,551] ERROR [ReplicaFetcherThread-0-1], Error in fetch
>> Name: FetchRequest; Version: 0; CorrelationId: 0; ClientId:
>> ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
>> RequestInfo: [news,3] -> PartitionFetchInfo(249,1048576),[news,1] ->
>> PartitionFetchInfo(268,1048576) (kafka.server.ReplicaFetcherThread)
>> java.nio.channels.UnresolvedAddressException
>>         at sun.nio.ch.Net.checkAddress(Net.java:127)
>> ...
>>
>
>
>
>--
>-- Guozhang
>


-- 

-- Guozhang

Re: Error in fetch Name. How to recover broken node?

Posted by Guozhang Wang <wa...@gmail.com>.

You do not need to delete the data folder, I think "file handles" here are
mostly due to socket leaks, i.e. network socket file handlers, not disk
file handlers. Just restart the broker should do the work.

Guozhang

On Mon, Nov 10, 2014 at 7:47 AM, Marco <ze...@yahoo.co.uk> wrote:

> We're using kafka 0.8.1.1.
>
> About network partition, it is an option.
> now i'm just wondering if deleting the data folder on the second node will
> at least have it come up again.
>
> i think another guy tried a kafka-reassign-partitions just before it all
> blew up.
>
>
> Il Lunedì 10 Novembre 2014 16:36, Guozhang Wang <wa...@gmail.com> ha
> scritto:
> Hi Marco,
>
> The fetch error comes from "UnresolvedAddressException", could you try to
> check if you have a network partition issue during that time?
>
> As for the "Too many file handlers", I think this is due to not properly
> handling such exceptions that it does not close the socket in time, which
> version of Kafka are you using?
>
> Guozhang
>
>
>
>
> On Mon, Nov 10, 2014 at 6:08 AM, Marco <ze...@yahoo.co.uk> wrote:
>
> > Hi,
> > i've got a 2-machine kafka cluster. For some reasons after a restart the
> > second node won't start.
> > i get tons of "Error in fetch Name" until I get a final "Too many open
> > files".
> >
> > How do i start dealing with this?
> >
> > thanks
> >
> > this is the error
> >
> > [2014-11-10 14:48:01,169] INFO [Kafka Server 2], started
> > (kafka.server.KafkaServer)
> > [2014-11-10 14:48:01,378] INFO [ReplicaFetcherManager on broker 2]
> Removed
> > fetcher for partitions
> > [news,3],[test,0],[test,2],[news,1],[test3,1],[test3,3]
> > (kafka.server.ReplicaFetcherManager)
> > [2014-11-10 14:48:01,459] INFO Truncating log news-3 to offset 249.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,462] INFO Truncating log test-0 to offset 0.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,462] INFO Truncating log test-2 to offset 0.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,463] INFO Truncating log news-1 to offset 268.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,464] INFO Truncating log test3-1 to offset 0.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,464] INFO Truncating log test3-3 to offset 0.
> > (kafka.log.Log)
> > [2014-11-10 14:48:01,530] INFO [ReplicaFetcherThread-0-1], Starting
> > (kafka.server.ReplicaFetcherThread)
> > [2014-11-10 14:48:01,535] INFO [ReplicaFetcherManager on broker 2] Added
> > fetcher for partitions ArrayBuffer([[news,3], initOffset 249 to broker
> > id:1,host:machine1,port:9092] , [[news,1], initOffset 268 to broker
> > id:1,host:machine1,port:9092] ) (kafka.server.ReplicaFetcherManager)
> > [2014-11-10 14:48:01,551] ERROR [ReplicaFetcherThread-0-1], Error in
> fetch
> > Name: FetchRequest; Version: 0; CorrelationId: 0; ClientId:
> > ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1
> bytes;
> > RequestInfo: [news,3] -> PartitionFetchInfo(249,1048576),[news,1] ->
> > PartitionFetchInfo(268,1048576) (kafka.server.ReplicaFetcherThread)
> > java.nio.channels.UnresolvedAddressException
> >         at sun.nio.ch.Net.checkAddress(Net.java:127)
> > ...
> >
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang

Re: Error in fetch Name. How to recover broken node?

Posted by Marco <ze...@yahoo.co.uk>.

We're using kafka 0.8.1.1.

About network partition, it is an option.
now i'm just wondering if deleting the data folder on the second node will at least have it come up again.

i think another guy tried a kafka-reassign-partitions just before it all blew up.


Il Lunedì 10 Novembre 2014 16:36, Guozhang Wang <wa...@gmail.com> ha scritto:
Hi Marco,

The fetch error comes from "UnresolvedAddressException", could you try to
check if you have a network partition issue during that time?

As for the "Too many file handlers", I think this is due to not properly
handling such exceptions that it does not close the socket in time, which
version of Kafka are you using?

Guozhang




On Mon, Nov 10, 2014 at 6:08 AM, Marco <ze...@yahoo.co.uk> wrote:

> Hi,
> i've got a 2-machine kafka cluster. For some reasons after a restart the
> second node won't start.
> i get tons of "Error in fetch Name" until I get a final "Too many open
> files".
>
> How do i start dealing with this?
>
> thanks
>
> this is the error
>
> [2014-11-10 14:48:01,169] INFO [Kafka Server 2], started
> (kafka.server.KafkaServer)
> [2014-11-10 14:48:01,378] INFO [ReplicaFetcherManager on broker 2] Removed
> fetcher for partitions
> [news,3],[test,0],[test,2],[news,1],[test3,1],[test3,3]
> (kafka.server.ReplicaFetcherManager)
> [2014-11-10 14:48:01,459] INFO Truncating log news-3 to offset 249.
> (kafka.log.Log)
> [2014-11-10 14:48:01,462] INFO Truncating log test-0 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,462] INFO Truncating log test-2 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,463] INFO Truncating log news-1 to offset 268.
> (kafka.log.Log)
> [2014-11-10 14:48:01,464] INFO Truncating log test3-1 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,464] INFO Truncating log test3-3 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,530] INFO [ReplicaFetcherThread-0-1], Starting
> (kafka.server.ReplicaFetcherThread)
> [2014-11-10 14:48:01,535] INFO [ReplicaFetcherManager on broker 2] Added
> fetcher for partitions ArrayBuffer([[news,3], initOffset 249 to broker
> id:1,host:machine1,port:9092] , [[news,1], initOffset 268 to broker
> id:1,host:machine1,port:9092] ) (kafka.server.ReplicaFetcherManager)
> [2014-11-10 14:48:01,551] ERROR [ReplicaFetcherThread-0-1], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 0; ClientId:
> ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo: [news,3] -> PartitionFetchInfo(249,1048576),[news,1] ->
> PartitionFetchInfo(268,1048576) (kafka.server.ReplicaFetcherThread)
> java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:127)
> ...
>



-- 
-- Guozhang

Re: Error in fetch Name. How to recover broken node?

Posted by Guozhang Wang <wa...@gmail.com>.

Hi Marco,

The fetch error comes from "UnresolvedAddressException", could you try to
check if you have a network partition issue during that time?

As for the "Too many file handlers", I think this is due to not properly
handling such exceptions that it does not close the socket in time, which
version of Kafka are you using?

Guozhang

On Mon, Nov 10, 2014 at 6:08 AM, Marco <ze...@yahoo.co.uk> wrote:

> Hi,
> i've got a 2-machine kafka cluster. For some reasons after a restart the
> second node won't start.
> i get tons of "Error in fetch Name" until I get a final "Too many open
> files".
>
> How do i start dealing with this?
>
> thanks
>
> this is the error
>
> [2014-11-10 14:48:01,169] INFO [Kafka Server 2], started
> (kafka.server.KafkaServer)
> [2014-11-10 14:48:01,378] INFO [ReplicaFetcherManager on broker 2] Removed
> fetcher for partitions
> [news,3],[test,0],[test,2],[news,1],[test3,1],[test3,3]
> (kafka.server.ReplicaFetcherManager)
> [2014-11-10 14:48:01,459] INFO Truncating log news-3 to offset 249.
> (kafka.log.Log)
> [2014-11-10 14:48:01,462] INFO Truncating log test-0 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,462] INFO Truncating log test-2 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,463] INFO Truncating log news-1 to offset 268.
> (kafka.log.Log)
> [2014-11-10 14:48:01,464] INFO Truncating log test3-1 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,464] INFO Truncating log test3-3 to offset 0.
> (kafka.log.Log)
> [2014-11-10 14:48:01,530] INFO [ReplicaFetcherThread-0-1], Starting
> (kafka.server.ReplicaFetcherThread)
> [2014-11-10 14:48:01,535] INFO [ReplicaFetcherManager on broker 2] Added
> fetcher for partitions ArrayBuffer([[news,3], initOffset 249 to broker
> id:1,host:machine1,port:9092] , [[news,1], initOffset 268 to broker
> id:1,host:machine1,port:9092] ) (kafka.server.ReplicaFetcherManager)
> [2014-11-10 14:48:01,551] ERROR [ReplicaFetcherThread-0-1], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 0; ClientId:
> ReplicaFetcherThread-0-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo: [news,3] -> PartitionFetchInfo(249,1048576),[news,1] ->
> PartitionFetchInfo(268,1048576) (kafka.server.ReplicaFetcherThread)
> java.nio.channels.UnresolvedAddressException
>         at sun.nio.ch.Net.checkAddress(Net.java:127)
> ...
>



-- 
-- Guozhang