You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Siva A <si...@gmail.com> on 2018/03/25 16:19:43 UTC

Kafka Mirrormaker issue

Hi,

We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
another 3 node cluster of same Kafka version.
Both the clusters are Kerberized and we are running the Mirrormaker on the
target cluster using the single principal/keytab with the one way trust on
the KDC.

At times, the mirrormaker stops functioning(Doesn't mirror the data) but
the process is still running. If we restart the service then it works fine
for a day or so.

I don't see any error on the Kafka logs as well.
Is there anyone seen this kind of issue?

Thanks
Siva

Re: Kafka Mirrormaker issue

Posted by Panos Skianis <ka...@skianis.com>.
For whatever is worth and from memory
In previous client versions (may have been fixed in 0.11), we had 3 consumers in the same consumer group, when a topic partition reassignment happened and 2 consumers had partitions and but the other one did not get any. So you could be in that scenario but you would need to check your metrics (assigned partitions) - it is possible for all consumers to get in that state.
However, to be more specific to mirrormaker, we saw mirrormaker shutting itself down (there is a setting to do that when a problem occurs, not sure if this is the default behaviour and I am not sure of  your config) when it goes in some state where it is still there but doing nothing does not die properly.Currently, we unfortunately restart mirrormaker pre-emptively every 3 hours because we haven't had the time to look into it any further. However, there was evidence in the mirror maker logs that it was trying to shutdown.
These are the two cases I remember where mirror maker could get in that state.
Panos

 





    On Friday, April 6, 2018 9:10 PM, Jeff Field <jv...@blizzard.com> wrote:
 

 I'm hitting the same problem, even with the new consumer, on MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 Kafka cluster.

On 3/30/18, 3:56 PM, "Andrew Otto" <ot...@wikimedia.org> wrote:

    I’m currently stuck on MirrorMaker version 0.9, and I’m not sure when the
    new consumer client became the default.  Does your 0.10 version have a
    —new.consumer option listed in the help message?  If so, then the new
    consumer client is not the default.  I haven’t seen the problem you are
    describing (I’m still having plenty of others though) since I’ve switched
    to using the new consumer.
    
    Another thought, what is the value of your partition.assignment.strategy?
    I’ve found round robin (default in later versions of MirrorMaker) to be a
    lot more consistent than whatever the default is in 0.9.  Not sure what the
    default in 0.10 is.
    
    
    
    On Fri, Mar 30, 2018 at 11:40 AM, Siva A <si...@gmail.com> wrote:
    
    > Any other update on this?
    >
    > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org> wrote:
    >
    > > I’ve had similar problems, but I don’t have an explanation for ya :/
    > >
    > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com>
    > wrote:
    > >
    > > > Hi,
    > > >
    > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
    > > > another 3 node cluster of same Kafka version.
    > > > Both the clusters are Kerberized and we are running the Mirrormaker on
    > > the
    > > > target cluster using the single principal/keytab with the one way trust
    > > on
    > > > the KDC.
    > > >
    > > > At times, the mirrormaker stops functioning(Doesn't mirror the data)
    > but
    > > > the process is still running. If we restart the service then it works
    > > fine
    > > > for a day or so.
    > > >
    > > > I don't see any error on the Kafka logs as well.
    > > > Is there anyone seen this kind of issue?
    > > >
    > > > Thanks
    > > > Siva
    > > >
    > >
    >
    



   

Re: Kafka Mirrormaker issue

Posted by Andrew Otto <ot...@wikimedia.org>.
We had trouble with batch expired produce errors for high (not really that
high, maybe 400 msgs/sec) volume topic partitions.  We solved these by
increasing `request.timeout.ms` and all increasing `batch.size` (which
reduced the total number of waiting batches in MirrorMaker).

More context here: https://phabricator.wikimedia.org/T189464#4102048



On Mon, Apr 9, 2018 at 1:09 PM, Jeff Field <jv...@blizzard.com> wrote:

> We've been stable all weekend with the following settings:
>
> ExecStart=/usr/bin/kafka-mirror-maker --abort.on.send.failure true
> --new.consumer --num.streams 6 --offset.commit.interval.ms 60000
> --consumer.config /etc/kafka/mirrormaker/telem_mm/consumer.properties
> --producer.config /etc/kafka/mirrormaker/telem_mm/producer.properties
> --whitelist
>
> Consumer properties:
> bootstrap
> session.timeout.ms=55000
> heartbeat.interval.ms=15000
> request.timeout.ms=60000
>
> Producer properties:
> Bootstrap
>
> Any other combination of compression/buffer memory/linger/etc. on the 0.9
> producer producing to 0.11/1.0 wasn't reliable - it might work for an hour
> and then die, or it might never work. Once I landed on stable producer
> settings (which were just defaults), the consumer started having time outs
> due to heartbeating (because again, 0.9) so I had to increase the
> heartbeat, session and request timeouts to stabilize the consumer group.
>
> Fortunately, our target cluster for most of our mirrormakers is the last
> one we will upgrade to 1.x, at which point we can just upgrade the
> mirrormakers to 1.x as well.
>
> On 4/6/18, 1:09 PM, "Jeff Field" <jv...@blizzard.com> wrote:
>
>     I'm hitting the same problem, even with the new consumer, on
> MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11
> Kafka cluster.
>
>     On 3/30/18, 3:56 PM, "Andrew Otto" <ot...@wikimedia.org> wrote:
>
>         I’m currently stuck on MirrorMaker version 0.9, and I’m not sure
> when the
>         new consumer client became the default.  Does your 0.10 version
> have a
>         —new.consumer option listed in the help message?  If so, then the
> new
>         consumer client is not the default.  I haven’t seen the problem
> you are
>         describing (I’m still having plenty of others though) since I’ve
> switched
>         to using the new consumer.
>
>         Another thought, what is the value of your
> partition.assignment.strategy?
>         I’ve found round robin (default in later versions of MirrorMaker)
> to be a
>         lot more consistent than whatever the default is in 0.9.  Not sure
> what the
>         default in 0.10 is.
>
>
>
>         On Fri, Mar 30, 2018 at 11:40 AM, Siva A <si...@gmail.com>
> wrote:
>
>         > Any other update on this?
>         >
>         > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org>
> wrote:
>         >
>         > > I’ve had similar problems, but I don’t have an explanation for
> ya :/
>         > >
>         > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <
> siva9940261121@gmail.com>
>         > wrote:
>         > >
>         > > > Hi,
>         > > >
>         > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring
> the data from
>         > > > another 3 node cluster of same Kafka version.
>         > > > Both the clusters are Kerberized and we are running the
> Mirrormaker on
>         > > the
>         > > > target cluster using the single principal/keytab with the
> one way trust
>         > > on
>         > > > the KDC.
>         > > >
>         > > > At times, the mirrormaker stops functioning(Doesn't mirror
> the data)
>         > but
>         > > > the process is still running. If we restart the service then
> it works
>         > > fine
>         > > > for a day or so.
>         > > >
>         > > > I don't see any error on the Kafka logs as well.
>         > > > Is there anyone seen this kind of issue?
>         > > >
>         > > > Thanks
>         > > > Siva
>         > > >
>         > >
>         >
>
>
>
>
>

Re: Kafka Mirrormaker issue

Posted by Jeff Field <jv...@blizzard.com>.
We've been stable all weekend with the following settings:

ExecStart=/usr/bin/kafka-mirror-maker --abort.on.send.failure true --new.consumer --num.streams 6 --offset.commit.interval.ms 60000 --consumer.config /etc/kafka/mirrormaker/telem_mm/consumer.properties --producer.config /etc/kafka/mirrormaker/telem_mm/producer.properties --whitelist

Consumer properties:
bootstrap
session.timeout.ms=55000
heartbeat.interval.ms=15000
request.timeout.ms=60000

Producer properties:
Bootstrap

Any other combination of compression/buffer memory/linger/etc. on the 0.9 producer producing to 0.11/1.0 wasn't reliable - it might work for an hour and then die, or it might never work. Once I landed on stable producer settings (which were just defaults), the consumer started having time outs due to heartbeating (because again, 0.9) so I had to increase the heartbeat, session and request timeouts to stabilize the consumer group.

Fortunately, our target cluster for most of our mirrormakers is the last one we will upgrade to 1.x, at which point we can just upgrade the mirrormakers to 1.x as well.

On 4/6/18, 1:09 PM, "Jeff Field" <jv...@blizzard.com> wrote:

    I'm hitting the same problem, even with the new consumer, on MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 Kafka cluster.
    
    On 3/30/18, 3:56 PM, "Andrew Otto" <ot...@wikimedia.org> wrote:
    
        I’m currently stuck on MirrorMaker version 0.9, and I’m not sure when the
        new consumer client became the default.  Does your 0.10 version have a
        —new.consumer option listed in the help message?  If so, then the new
        consumer client is not the default.  I haven’t seen the problem you are
        describing (I’m still having plenty of others though) since I’ve switched
        to using the new consumer.
        
        Another thought, what is the value of your partition.assignment.strategy?
        I’ve found round robin (default in later versions of MirrorMaker) to be a
        lot more consistent than whatever the default is in 0.9.  Not sure what the
        default in 0.10 is.
        
        
        
        On Fri, Mar 30, 2018 at 11:40 AM, Siva A <si...@gmail.com> wrote:
        
        > Any other update on this?
        >
        > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org> wrote:
        >
        > > I’ve had similar problems, but I don’t have an explanation for ya :/
        > >
        > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com>
        > wrote:
        > >
        > > > Hi,
        > > >
        > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
        > > > another 3 node cluster of same Kafka version.
        > > > Both the clusters are Kerberized and we are running the Mirrormaker on
        > > the
        > > > target cluster using the single principal/keytab with the one way trust
        > > on
        > > > the KDC.
        > > >
        > > > At times, the mirrormaker stops functioning(Doesn't mirror the data)
        > but
        > > > the process is still running. If we restart the service then it works
        > > fine
        > > > for a day or so.
        > > >
        > > > I don't see any error on the Kafka logs as well.
        > > > Is there anyone seen this kind of issue?
        > > >
        > > > Thanks
        > > > Siva
        > > >
        > >
        >
        
    
    


Re: Kafka Mirrormaker issue

Posted by Jeff Field <jv...@blizzard.com>.
I'm hitting the same problem, even with the new consumer, on MirrorMaker 0.9 reading from a 0.9 Kafka cluster and producing to a 0.11 Kafka cluster.

On 3/30/18, 3:56 PM, "Andrew Otto" <ot...@wikimedia.org> wrote:

    I’m currently stuck on MirrorMaker version 0.9, and I’m not sure when the
    new consumer client became the default.  Does your 0.10 version have a
    —new.consumer option listed in the help message?  If so, then the new
    consumer client is not the default.  I haven’t seen the problem you are
    describing (I’m still having plenty of others though) since I’ve switched
    to using the new consumer.
    
    Another thought, what is the value of your partition.assignment.strategy?
    I’ve found round robin (default in later versions of MirrorMaker) to be a
    lot more consistent than whatever the default is in 0.9.  Not sure what the
    default in 0.10 is.
    
    
    
    On Fri, Mar 30, 2018 at 11:40 AM, Siva A <si...@gmail.com> wrote:
    
    > Any other update on this?
    >
    > On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org> wrote:
    >
    > > I’ve had similar problems, but I don’t have an explanation for ya :/
    > >
    > > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com>
    > wrote:
    > >
    > > > Hi,
    > > >
    > > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
    > > > another 3 node cluster of same Kafka version.
    > > > Both the clusters are Kerberized and we are running the Mirrormaker on
    > > the
    > > > target cluster using the single principal/keytab with the one way trust
    > > on
    > > > the KDC.
    > > >
    > > > At times, the mirrormaker stops functioning(Doesn't mirror the data)
    > but
    > > > the process is still running. If we restart the service then it works
    > > fine
    > > > for a day or so.
    > > >
    > > > I don't see any error on the Kafka logs as well.
    > > > Is there anyone seen this kind of issue?
    > > >
    > > > Thanks
    > > > Siva
    > > >
    > >
    >
    


Re: Kafka Mirrormaker issue

Posted by Andrew Otto <ot...@wikimedia.org>.
I’m currently stuck on MirrorMaker version 0.9, and I’m not sure when the
new consumer client became the default.  Does your 0.10 version have a
—new.consumer option listed in the help message?  If so, then the new
consumer client is not the default.  I haven’t seen the problem you are
describing (I’m still having plenty of others though) since I’ve switched
to using the new consumer.

Another thought, what is the value of your partition.assignment.strategy?
I’ve found round robin (default in later versions of MirrorMaker) to be a
lot more consistent than whatever the default is in 0.9.  Not sure what the
default in 0.10 is.



On Fri, Mar 30, 2018 at 11:40 AM, Siva A <si...@gmail.com> wrote:

> Any other update on this?
>
> On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org> wrote:
>
> > I’ve had similar problems, but I don’t have an explanation for ya :/
> >
> > On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
> > > another 3 node cluster of same Kafka version.
> > > Both the clusters are Kerberized and we are running the Mirrormaker on
> > the
> > > target cluster using the single principal/keytab with the one way trust
> > on
> > > the KDC.
> > >
> > > At times, the mirrormaker stops functioning(Doesn't mirror the data)
> but
> > > the process is still running. If we restart the service then it works
> > fine
> > > for a day or so.
> > >
> > > I don't see any error on the Kafka logs as well.
> > > Is there anyone seen this kind of issue?
> > >
> > > Thanks
> > > Siva
> > >
> >
>

Re: Kafka Mirrormaker issue

Posted by Siva A <si...@gmail.com>.
Any other update on this?

On Mon, Mar 26, 2018, 7:42 PM Andrew Otto <ot...@wikimedia.org> wrote:

> I’ve had similar problems, but I don’t have an explanation for ya :/
>
> On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com> wrote:
>
> > Hi,
> >
> > We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
> > another 3 node cluster of same Kafka version.
> > Both the clusters are Kerberized and we are running the Mirrormaker on
> the
> > target cluster using the single principal/keytab with the one way trust
> on
> > the KDC.
> >
> > At times, the mirrormaker stops functioning(Doesn't mirror the data) but
> > the process is still running. If we restart the service then it works
> fine
> > for a day or so.
> >
> > I don't see any error on the Kafka logs as well.
> > Is there anyone seen this kind of issue?
> >
> > Thanks
> > Siva
> >
>

Re: Kafka Mirrormaker issue

Posted by Andrew Otto <ot...@wikimedia.org>.
I’ve had similar problems, but I don’t have an explanation for ya :/

On Sun, Mar 25, 2018 at 12:19 PM, Siva A <si...@gmail.com> wrote:

> Hi,
>
> We have 3 nodes Kafka cluster(0.10.0.1) and its mirroring the data from
> another 3 node cluster of same Kafka version.
> Both the clusters are Kerberized and we are running the Mirrormaker on the
> target cluster using the single principal/keytab with the one way trust on
> the KDC.
>
> At times, the mirrormaker stops functioning(Doesn't mirror the data) but
> the process is still running. If we restart the service then it works fine
> for a day or so.
>
> I don't see any error on the Kafka logs as well.
> Is there anyone seen this kind of issue?
>
> Thanks
> Siva
>