You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2014/12/13 01:51:15 UTC

Question on Samza kafka consumer

Hi

I am almost done in PoC and finally I need to clear my several concerns
regarding resilience problem. Then hopefully, if everything goes well, you
will have the 2nd biggest Samza cluster out of LinkedIn within a year.

Samza is not using high level Kafka consumer but it still have zookeeper
dependency to discovery kafka broker and topic, partitions.

1. How does it handle rebalancing when we add more partitions to the topic?
Do we need to restart the job? Or does it start consuming automatically?

2. When the zookeeper dies, what will happen? If Samza kafka consumer is
not heavily dependent on zookeeper for distribution coordination, I think
that we don't need to restart jobs in an urgent way. ZkClient will try to
re-establish sessions but sometimes it cannot recover at all in the worse
case, we observed a few times so far.

Thank you
Best, Jae

Re: Question on Samza kafka consumer

Posted by Chris Riccomini <cr...@linkedin.com.INVALID>.
Hey Jae,

> 1. How does it handle rebalancing when we add more partitions to the
>topic? Do we need to restart the job? Or does it start consuming
>automatically?

Currently, you need to restart your job. After SAMZA-448, we will add
logic to JobCoordinator to automatically bounce containers and re-assign
partitions when they're added, but this involves code that is
yet-to-be-written.

> 2. When the zookeeper dies, what will happen? If Samza kafka consumer is
not heavily dependent on zookeeper for distribution coordination, I think
that we don't need to restart jobs in an urgent way. ZkClient will try to
re-establish sessions but sometimes it cannot recover at all in the worse
case, we observed a few times so far.

Samza never depends directly on ZK. It does depend on ZK transitively
through Kafka. It's used heavily on job start, but afterwards, ZK is used
only in scenarios where partitions have shifted between brokers (a
rebalance), or other failure scenarios.

Again, the situation is that Kafka's consumer/producer APIs are changing
to remove ALL ZK dependencies from both the consumer/producer. When that
happens, Samza will not ever talk to ZK when Kafka is used. This is part
of the 0.8.2 producer re-write, and the yet-to-be-released consumer
re-write.

Short answer: jobs require restarts to pick up new partitions right now.
ZK failures should not heavily impact Samza except. Both of these should
be going away in '15, as the job coordinator and kafka consumer/producer
re-write come online.

Cheers,
Chris

On 12/12/14 4:51 PM, "Bae, Jae Hyeon" <me...@gmail.com> wrote:

>Hi
>
>I am almost done in PoC and finally I need to clear my several concerns
>regarding resilience problem. Then hopefully, if everything goes well, you
>will have the 2nd biggest Samza cluster out of LinkedIn within a year.
>
>Samza is not using high level Kafka consumer but it still have zookeeper
>dependency to discovery kafka broker and topic, partitions.
>
>1. How does it handle rebalancing when we add more partitions to the
>topic?
>Do we need to restart the job? Or does it start consuming automatically?
>
>2. When the zookeeper dies, what will happen? If Samza kafka consumer is
>not heavily dependent on zookeeper for distribution coordination, I think
>that we don't need to restart jobs in an urgent way. ZkClient will try to
>re-establish sessions but sometimes it cannot recover at all in the worse
>case, we observed a few times so far.
>
>Thank you
>Best, Jae