You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Christian Carollo <cc...@gmail.com> on 2012/01/10 06:52:58 UTC

Kafka/ZK Cluster Example

I am looking to implement Kafka in a production environment, however, I
haven't found in documentation or examples that
discuss how to build a redundant implementation.  Is there any
documentation out their (blogs, articles, etc.) that describes
how we can implement such a system with Kafka 0.6 or 0.7.

Also, is there a timeframe the community is shooting for, to release 0.8 w/
replication?

Thanks
Christian

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

For 1), roughly speaking, hosts in a ZK cluster replicate among themselves
synchronously. So having multiple ZK hosts improves reliability. ZK
tolerates k failures with 2k+1 hosts.

For 2), that's exactly how our ZK-based producer works.

Thanks,

Jun

On Wed, Jan 11, 2012 at 3:35 PM, Christian Carollo <cc...@gmail.com>wrote:

> This leads to two more questions…
>
> 1) Maybe I am not understanding what a ZK cluster typically looks like or
> is made up of.  If I have more than one ZK service/instance running on a
> single node that doesn't sound like it is more reliable when there is a
> server failure.
>
> On the other hand, if I have one ZK on one node and another on another
> node, even as a hot standby via mirroring, that seems like a more reliable
> solution.  I think I must be missing something, am I?
>
> 2) Can the client producer interrogate the ZK service and determine if it
> is available and/or if one or more brokers are available?  And if so get
> there connection information from ZK so that the producer can intelligently
> send messages to the right brokers?  If this is possible the client
> producer could handle failure cases and either contact a different
> (hot-standby) ZK or Broker?
>
> Thanks
> Christian
>
> On Jan 11, 2012, at 3:16 PM, Felix GV wrote:
>
> > As I understand it, you cannot use a mirrored Kafka cluster as a hot
> > fail-over.
> >
> > You could probably use it as a manual fail-over, but I don't know the
> > complexity involved in doing that.
> >
> > Also, if your source cluster fails while producers were putting data into
> > it, there will be an "unconsumed window" of data that is lost. This
> > corresponds to the data that the embedded consumer in the mirrored
> cluster
> > did not have time to consume from the source cluster.
> >
> > All in all, the mirrored cluster is akin to asynchronous replication,
> > without any hot fail-over capability. Thus, it provides data redundancy
> > (outside of the unconsumed window described above) but no extra
> > availability (unless you count manual interventions).
> >
> > KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> > hand, will provide both asynchronous AND synchronous replication
> (although
> > the latter will incur a latency penalty) and will be able to use the
> > replicas (data redundancy) as hot-fail overs.
> >
> > Depending on your personal definition of "highly reliable" (whether it
> > includes data redundancy and/or availability), I think that should
> probably
> > answer your question...?
> >
> > To all the Kafka experts: please correct me if the above explanations are
> > incorrect :) !
> >
> > --
> > Felix
> >
> >
> >
> > On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> It's just that the mirroring logic depends on ZK to be available most of
> >> the time.
> >>
> >> Jun
> >>
> >> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
> >>> wrote:
> >>
> >>> I see.  But if I used that configuration and then did the mirroring you
> >>> suggested would that be enough, in your opinion, to be considered
> highly
> >>> reliable?
> >>>
> >>> Christian
> >>>
> >>>
> >>> On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
> >>>
> >>>>> For example, can I have one ZK instance and one broker on one machine
> >>> and
> >>>> that is enough to define a ZK cluster and a Kafka Cluster?
> >>>>
> >>>> Yes, although you don't get the reliability of ZK now.
> >>>>
> >>>> Jun
> >>>>
> >>>>
> >>>> On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
> ccarollo@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Jun,
> >>>>>
> >>>>> I don't think I ask my question the right way.
> >>>>>
> >>>>> What I am trying to understand is what are the minimum constituent
> >> parts
> >>>>> of a kafka cluster?
> >>>>>
> >>>>> Based on your last email, I am now wondering what are the minimum
> >>>>> constituent parts of a ZK cluster as well as a Kafka cluster?
> >>>>>
> >>>>> For example, can I have one ZK instance and one broker on one machine
> >>> and
> >>>>> that is enough to define a ZK cluster and a Kafka Cluster?
> >>>>>
> >>>>> Thanks,
> >>>>> Christian
> >>>>>
> >>>>>
> >>>>> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> >>>>>
> >>>>>> Chrsitan,
> >>>>>>
> >>>>>> A Kafka cluster containers a ZK cluster and a list of brokers. When
> a
> >>>>>> consumer subscribes to a topic in a kafka cluster, it consumes data
> >>>>> stored
> >>>>>> in all brokers in that cluster.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Jun
> >>>>>>
> >>>>>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> >>> ccarollo@gmail.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thank you Jun that is quite helpful.  I have a question about Kafka
> >>>>>>> Clusters.  What are the minimum number and types of services that
> >> must
> >>>>> be
> >>>>>>> running to make up a Kafka Cluster?
> >>>>>>>
> >>>>>>> I ask this because the diagrams (in the Kafka Mirroring document)
> >>> allude
> >>>>>>> to a multiple broker environment, however, since each broker does
> >> not
> >>>>>>> appear to provide redundancy (as of today) to any of the other
> >> brokers
> >>>>> in a
> >>>>>>> given zookeeper service, it seems like a Kafka Cluster is nothing
> >> more
> >>>>> than
> >>>>>>> a grouping of a single zookeeper instance with a single Kafka
> >> broker,
> >>> is
> >>>>>>> this the correct understanding?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Christian
> >>>>>>>
> >>>>>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> >>>>>>>
> >>>>>>>> With 0.7, you can set up inter-cluster replication (
> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
> >> ).
> >>>>>>>>
> >>>>>>>> For the future 0.8 release, we are working on intra-cluster
> >>> replication
> >>>>>>>> support and details can be found at
> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-50
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Jun
> >>>>>>>>
> >>>>>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> >>> ccarollo@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I am looking to implement Kafka in a production environment,
> >>> however,
> >>>>> I
> >>>>>>>>> haven't found in documentation or examples that
> >>>>>>>>> discuss how to build a redundant implementation.  Is there any
> >>>>>>>>> documentation out their (blogs, articles, etc.) that describes
> >>>>>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> >>>>>>>>>
> >>>>>>>>> Also, is there a timeframe the community is shooting for, to
> >> release
> >>>>>>> 0.8 w/
> >>>>>>>>> replication?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Christian
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Kafka/ZK Cluster Example

Posted by Christian Carollo <cc...@gmail.com>.

This leads to two more questions…

1) Maybe I am not understanding what a ZK cluster typically looks like or is made up of.  If I have more than one ZK service/instance running on a single node that doesn't sound like it is more reliable when there is a server failure.

On the other hand, if I have one ZK on one node and another on another node, even as a hot standby via mirroring, that seems like a more reliable solution.  I think I must be missing something, am I?

2) Can the client producer interrogate the ZK service and determine if it is available and/or if one or more brokers are available?  And if so get there connection information from ZK so that the producer can intelligently send messages to the right brokers?  If this is possible the client producer could handle failure cases and either contact a different (hot-standby) ZK or Broker?

Thanks
Christian

On Jan 11, 2012, at 3:16 PM, Felix GV wrote:

> As I understand it, you cannot use a mirrored Kafka cluster as a hot
> fail-over.
> 
> You could probably use it as a manual fail-over, but I don't know the
> complexity involved in doing that.
> 
> Also, if your source cluster fails while producers were putting data into
> it, there will be an "unconsumed window" of data that is lost. This
> corresponds to the data that the embedded consumer in the mirrored cluster
> did not have time to consume from the source cluster.
> 
> All in all, the mirrored cluster is akin to asynchronous replication,
> without any hot fail-over capability. Thus, it provides data redundancy
> (outside of the unconsumed window described above) but no extra
> availability (unless you count manual interventions).
> 
> KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> hand, will provide both asynchronous AND synchronous replication (although
> the latter will incur a latency penalty) and will be able to use the
> replicas (data redundancy) as hot-fail overs.
> 
> Depending on your personal definition of "highly reliable" (whether it
> includes data redundancy and/or availability), I think that should probably
> answer your question...?
> 
> To all the Kafka experts: please correct me if the above explanations are
> incorrect :) !
> 
> --
> Felix
> 
> 
> 
> On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
> 
>> It's just that the mirroring logic depends on ZK to be available most of
>> the time.
>> 
>> Jun
>> 
>> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
>>> wrote:
>> 
>>> I see.  But if I used that configuration and then did the mirroring you
>>> suggested would that be enough, in your opinion, to be considered highly
>>> reliable?
>>> 
>>> Christian
>>> 
>>> 
>>> On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
>>> 
>>>>> For example, can I have one ZK instance and one broker on one machine
>>> and
>>>> that is enough to define a ZK cluster and a Kafka Cluster?
>>>> 
>>>> Yes, although you don't get the reliability of ZK now.
>>>> 
>>>> Jun
>>>> 
>>>> 
>>>> On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <ccarollo@gmail.com
>>>> wrote:
>>>> 
>>>>> Jun,
>>>>> 
>>>>> I don't think I ask my question the right way.
>>>>> 
>>>>> What I am trying to understand is what are the minimum constituent
>> parts
>>>>> of a kafka cluster?
>>>>> 
>>>>> Based on your last email, I am now wondering what are the minimum
>>>>> constituent parts of a ZK cluster as well as a Kafka cluster?
>>>>> 
>>>>> For example, can I have one ZK instance and one broker on one machine
>>> and
>>>>> that is enough to define a ZK cluster and a Kafka Cluster?
>>>>> 
>>>>> Thanks,
>>>>> Christian
>>>>> 
>>>>> 
>>>>> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
>>>>> 
>>>>>> Chrsitan,
>>>>>> 
>>>>>> A Kafka cluster containers a ZK cluster and a list of brokers. When a
>>>>>> consumer subscribes to a topic in a kafka cluster, it consumes data
>>>>> stored
>>>>>> in all brokers in that cluster.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
>>> ccarollo@gmail.com
>>>>>> wrote:
>>>>>> 
>>>>>>> Thank you Jun that is quite helpful.  I have a question about Kafka
>>>>>>> Clusters.  What are the minimum number and types of services that
>> must
>>>>> be
>>>>>>> running to make up a Kafka Cluster?
>>>>>>> 
>>>>>>> I ask this because the diagrams (in the Kafka Mirroring document)
>>> allude
>>>>>>> to a multiple broker environment, however, since each broker does
>> not
>>>>>>> appear to provide redundancy (as of today) to any of the other
>> brokers
>>>>> in a
>>>>>>> given zookeeper service, it seems like a Kafka Cluster is nothing
>> more
>>>>> than
>>>>>>> a grouping of a single zookeeper instance with a single Kafka
>> broker,
>>> is
>>>>>>> this the correct understanding?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Christian
>>>>>>> 
>>>>>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
>>>>>>> 
>>>>>>>> With 0.7, you can set up inter-cluster replication (
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
>> ).
>>>>>>>> 
>>>>>>>> For the future 0.8 release, we are working on intra-cluster
>>> replication
>>>>>>>> support and details can be found at
>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-50
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>> Jun
>>>>>>>> 
>>>>>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
>>> ccarollo@gmail.com
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I am looking to implement Kafka in a production environment,
>>> however,
>>>>> I
>>>>>>>>> haven't found in documentation or examples that
>>>>>>>>> discuss how to build a redundant implementation.  Is there any
>>>>>>>>> documentation out their (blogs, articles, etc.) that describes
>>>>>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
>>>>>>>>> 
>>>>>>>>> Also, is there a timeframe the community is shooting for, to
>> release
>>>>>>> 0.8 w/
>>>>>>>>> replication?
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Christian
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Kafka/ZK Cluster Example

Posted by Felix GV <fe...@mate1inc.com>.

Ok, I had overlooked that... so, more redundant RAID arrays would decrease
the chances that we lose data, but it wouldn't help much with availability
because rebuilding the arrays after failures interferes with Kafka's normal
IO.

Really looking forward to sync replication with KAFKA-50 :D !!

--
Felix



On Thu, Jan 12, 2012 at 12:15 PM, Jun Rao <ju...@gmail.com> wrote:

> Felix,
>
> We use RAID too. One potential problem with RAID is that if you replace a
> broken disk, RAID goes into rebuild mode. This could significantly slow
> down I/O and make a broker not fully functional for new requests. Adding
> more mirrors doesn't alleviate this problem.
>
> Jun
>
> On Wed, Jan 11, 2012 at 3:50 PM, Felix GV <fe...@mate1inc.com> wrote:
>
> > We've been thinking about this stuff a lot recently, at work.
> >
> > We've had some HD failures in our Kafka cluster. I don't know all the
> > details, but from what I heard, the HDs were mirrored in RAID but several
> > of them failed in a close time interval and the array did not have time
> to
> > fully rebuild itself, so we lost all of that data from the Kafka cluster.
> > Thankfully, the data was being consumed in near real time, so we only
> > really lost a small unconsumed window of data.
> >
> > Now, we're wondering what we could improve to prevent this scenario in
> the
> > future. I investigated Kafka mirroring but since it relies on consuming
> > data, the probability to lose the unconsumed window is still there. If we
> > had consumers that were more batch oriented (like hadoop) rather than
> > real-time, the benefits of a mirrored Kafka cluster would be greater, but
> > for our use cases, where data is consumed near real-time, we would still
> > lose as much data as before. Am I right?
> >
> > KAFKA-50, with sync replication would have solved our problem, but until
> > that's done, what are our options?
> >
> > I came to the conclusion that simply adding more mirrored copies in our
> > RAID arrays would be the most cost-effective way to give us both more
> > availability and more redundancy. This doesn't deal with the scenario
> where
> > a machine fails and becomes unavailable, in which case the data on it
> would
> > be temporarily unavailable but not lost (although, again, there could be
> a
> > small window of uncommited data). However, in terms of protection against
> > data loss from HD failures, it seems like the best option for now, no?
> >
> > It doesn't feel right to just throw more hardware at problems hehe...
> but I
> > guess sometimes it's the only choice :) ...
> >
> > Please tell me if that makes sense!
> >
> > --
> > Felix
> >
> >
> >
> > On Wed, Jan 11, 2012 at 6:16 PM, Felix GV <fe...@mate1inc.com> wrote:
> >
> > > As I understand it, you cannot use a mirrored Kafka cluster as a hot
> > > fail-over.
> > >
> > > You could probably use it as a manual fail-over, but I don't know the
> > > complexity involved in doing that.
> > >
> > > Also, if your source cluster fails while producers were putting data
> into
> > > it, there will be an "unconsumed window" of data that is lost. This
> > > corresponds to the data that the embedded consumer in the mirrored
> > cluster
> > > did not have time to consume from the source cluster.
> > >
> > > All in all, the mirrored cluster is akin to asynchronous replication,
> > > without any hot fail-over capability. Thus, it provides data redundancy
> > > (outside of the unconsumed window described above) but no extra
> > > availability (unless you count manual interventions).
> > >
> > > KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the
> other
> > > hand, will provide both asynchronous AND synchronous replication
> > (although
> > > the latter will incur a latency penalty) and will be able to use the
> > > replicas (data redundancy) as hot-fail overs.
> > >
> > > Depending on your personal definition of "highly reliable" (whether it
> > > includes data redundancy and/or availability), I think that should
> > probably
> > > answer your question...?
> > >
> > > To all the Kafka experts: please correct me if the above explanations
> are
> > > incorrect :) !
> > >
> > > --
> > > Felix
> > >
> > >
> > >
> > >
> > > On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > >> It's just that the mirroring logic depends on ZK to be available most
> of
> > >> the time.
> > >>
> > >> Jun
> > >>
> > >> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <
> ccarollo@gmail.com
> > >> >wrote:
> > >>
> > >> > I see.  But if I used that configuration and then did the mirroring
> > you
> > >> > suggested would that be enough, in your opinion, to be considered
> > highly
> > >> > reliable?
> > >> >
> > >> > Christian
> > >> >
> > >> >
> > >> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
> > >> >
> > >> > >> For example, can I have one ZK instance and one broker on one
> > machine
> > >> > and
> > >> > > that is enough to define a ZK cluster and a Kafka Cluster?
> > >> > >
> > >> > > Yes, although you don't get the reliability of ZK now.
> > >> > >
> > >> > > Jun
> > >> > >
> > >> > >
> > >> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
> > >> ccarollo@gmail.com
> > >> > >wrote:
> > >> > >
> > >> > >> Jun,
> > >> > >>
> > >> > >> I don't think I ask my question the right way.
> > >> > >>
> > >> > >> What I am trying to understand is what are the minimum
> constituent
> > >> parts
> > >> > >> of a kafka cluster?
> > >> > >>
> > >> > >> Based on your last email, I am now wondering what are the minimum
> > >> > >> constituent parts of a ZK cluster as well as a Kafka cluster?
> > >> > >>
> > >> > >> For example, can I have one ZK instance and one broker on one
> > machine
> > >> > and
> > >> > >> that is enough to define a ZK cluster and a Kafka Cluster?
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Christian
> > >> > >>
> > >> > >>
> > >> > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> > >> > >>
> > >> > >>> Chrsitan,
> > >> > >>>
> > >> > >>> A Kafka cluster containers a ZK cluster and a list of brokers.
> > When
> > >> a
> > >> > >>> consumer subscribes to a topic in a kafka cluster, it consumes
> > data
> > >> > >> stored
> > >> > >>> in all brokers in that cluster.
> > >> > >>>
> > >> > >>> Thanks,
> > >> > >>>
> > >> > >>> Jun
> > >> > >>>
> > >> > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> > >> > ccarollo@gmail.com
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>>> Thank you Jun that is quite helpful.  I have a question about
> > Kafka
> > >> > >>>> Clusters.  What are the minimum number and types of services
> that
> > >> must
> > >> > >> be
> > >> > >>>> running to make up a Kafka Cluster?
> > >> > >>>>
> > >> > >>>> I ask this because the diagrams (in the Kafka Mirroring
> document)
> > >> > allude
> > >> > >>>> to a multiple broker environment, however, since each broker
> does
> > >> not
> > >> > >>>> appear to provide redundancy (as of today) to any of the other
> > >> brokers
> > >> > >> in a
> > >> > >>>> given zookeeper service, it seems like a Kafka Cluster is
> nothing
> > >> more
> > >> > >> than
> > >> > >>>> a grouping of a single zookeeper instance with a single Kafka
> > >> broker,
> > >> > is
> > >> > >>>> this the correct understanding?
> > >> > >>>>
> > >> > >>>> Thanks,
> > >> > >>>> Christian
> > >> > >>>>
> > >> > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> > >> > >>>>
> > >> > >>>>> With 0.7, you can set up inter-cluster replication (
> > >> > >>>>>
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
> > >> ).
> > >> > >>>>>
> > >> > >>>>> For the future 0.8 release, we are working on intra-cluster
> > >> > replication
> > >> > >>>>> support and details can be found at
> > >> > >>>>> https://issues.apache.org/jira/browse/KAFKA-50
> > >> > >>>>>
> > >> > >>>>> Thanks,
> > >> > >>>>>
> > >> > >>>>> Jun
> > >> > >>>>>
> > >> > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> > >> > ccarollo@gmail.com
> > >> > >>>>> wrote:
> > >> > >>>>>
> > >> > >>>>>> I am looking to implement Kafka in a production environment,
> > >> > however,
> > >> > >> I
> > >> > >>>>>> haven't found in documentation or examples that
> > >> > >>>>>> discuss how to build a redundant implementation.  Is there
> any
> > >> > >>>>>> documentation out their (blogs, articles, etc.) that
> describes
> > >> > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> > >> > >>>>>>
> > >> > >>>>>> Also, is there a timeframe the community is shooting for, to
> > >> release
> > >> > >>>> 0.8 w/
> > >> > >>>>>> replication?
> > >> > >>>>>>
> > >> > >>>>>> Thanks
> > >> > >>>>>> Christian
> > >> > >>>>>>
> > >> > >>>>
> > >> > >>>>
> > >> > >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

Felix,

We use RAID too. One potential problem with RAID is that if you replace a
broken disk, RAID goes into rebuild mode. This could significantly slow
down I/O and make a broker not fully functional for new requests. Adding
more mirrors doesn't alleviate this problem.

Jun

On Wed, Jan 11, 2012 at 3:50 PM, Felix GV <fe...@mate1inc.com> wrote:

> We've been thinking about this stuff a lot recently, at work.
>
> We've had some HD failures in our Kafka cluster. I don't know all the
> details, but from what I heard, the HDs were mirrored in RAID but several
> of them failed in a close time interval and the array did not have time to
> fully rebuild itself, so we lost all of that data from the Kafka cluster.
> Thankfully, the data was being consumed in near real time, so we only
> really lost a small unconsumed window of data.
>
> Now, we're wondering what we could improve to prevent this scenario in the
> future. I investigated Kafka mirroring but since it relies on consuming
> data, the probability to lose the unconsumed window is still there. If we
> had consumers that were more batch oriented (like hadoop) rather than
> real-time, the benefits of a mirrored Kafka cluster would be greater, but
> for our use cases, where data is consumed near real-time, we would still
> lose as much data as before. Am I right?
>
> KAFKA-50, with sync replication would have solved our problem, but until
> that's done, what are our options?
>
> I came to the conclusion that simply adding more mirrored copies in our
> RAID arrays would be the most cost-effective way to give us both more
> availability and more redundancy. This doesn't deal with the scenario where
> a machine fails and becomes unavailable, in which case the data on it would
> be temporarily unavailable but not lost (although, again, there could be a
> small window of uncommited data). However, in terms of protection against
> data loss from HD failures, it seems like the best option for now, no?
>
> It doesn't feel right to just throw more hardware at problems hehe... but I
> guess sometimes it's the only choice :) ...
>
> Please tell me if that makes sense!
>
> --
> Felix
>
>
>
> On Wed, Jan 11, 2012 at 6:16 PM, Felix GV <fe...@mate1inc.com> wrote:
>
> > As I understand it, you cannot use a mirrored Kafka cluster as a hot
> > fail-over.
> >
> > You could probably use it as a manual fail-over, but I don't know the
> > complexity involved in doing that.
> >
> > Also, if your source cluster fails while producers were putting data into
> > it, there will be an "unconsumed window" of data that is lost. This
> > corresponds to the data that the embedded consumer in the mirrored
> cluster
> > did not have time to consume from the source cluster.
> >
> > All in all, the mirrored cluster is akin to asynchronous replication,
> > without any hot fail-over capability. Thus, it provides data redundancy
> > (outside of the unconsumed window described above) but no extra
> > availability (unless you count manual interventions).
> >
> > KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> > hand, will provide both asynchronous AND synchronous replication
> (although
> > the latter will incur a latency penalty) and will be able to use the
> > replicas (data redundancy) as hot-fail overs.
> >
> > Depending on your personal definition of "highly reliable" (whether it
> > includes data redundancy and/or availability), I think that should
> probably
> > answer your question...?
> >
> > To all the Kafka experts: please correct me if the above explanations are
> > incorrect :) !
> >
> > --
> > Felix
> >
> >
> >
> >
> > On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> It's just that the mirroring logic depends on ZK to be available most of
> >> the time.
> >>
> >> Jun
> >>
> >> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
> >> >wrote:
> >>
> >> > I see.  But if I used that configuration and then did the mirroring
> you
> >> > suggested would that be enough, in your opinion, to be considered
> highly
> >> > reliable?
> >> >
> >> > Christian
> >> >
> >> >
> >> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
> >> >
> >> > >> For example, can I have one ZK instance and one broker on one
> machine
> >> > and
> >> > > that is enough to define a ZK cluster and a Kafka Cluster?
> >> > >
> >> > > Yes, although you don't get the reliability of ZK now.
> >> > >
> >> > > Jun
> >> > >
> >> > >
> >> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
> >> ccarollo@gmail.com
> >> > >wrote:
> >> > >
> >> > >> Jun,
> >> > >>
> >> > >> I don't think I ask my question the right way.
> >> > >>
> >> > >> What I am trying to understand is what are the minimum constituent
> >> parts
> >> > >> of a kafka cluster?
> >> > >>
> >> > >> Based on your last email, I am now wondering what are the minimum
> >> > >> constituent parts of a ZK cluster as well as a Kafka cluster?
> >> > >>
> >> > >> For example, can I have one ZK instance and one broker on one
> machine
> >> > and
> >> > >> that is enough to define a ZK cluster and a Kafka Cluster?
> >> > >>
> >> > >> Thanks,
> >> > >> Christian
> >> > >>
> >> > >>
> >> > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > >>
> >> > >>> Chrsitan,
> >> > >>>
> >> > >>> A Kafka cluster containers a ZK cluster and a list of brokers.
> When
> >> a
> >> > >>> consumer subscribes to a topic in a kafka cluster, it consumes
> data
> >> > >> stored
> >> > >>> in all brokers in that cluster.
> >> > >>>
> >> > >>> Thanks,
> >> > >>>
> >> > >>> Jun
> >> > >>>
> >> > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> >> > ccarollo@gmail.com
> >> > >>> wrote:
> >> > >>>
> >> > >>>> Thank you Jun that is quite helpful.  I have a question about
> Kafka
> >> > >>>> Clusters.  What are the minimum number and types of services that
> >> must
> >> > >> be
> >> > >>>> running to make up a Kafka Cluster?
> >> > >>>>
> >> > >>>> I ask this because the diagrams (in the Kafka Mirroring document)
> >> > allude
> >> > >>>> to a multiple broker environment, however, since each broker does
> >> not
> >> > >>>> appear to provide redundancy (as of today) to any of the other
> >> brokers
> >> > >> in a
> >> > >>>> given zookeeper service, it seems like a Kafka Cluster is nothing
> >> more
> >> > >> than
> >> > >>>> a grouping of a single zookeeper instance with a single Kafka
> >> broker,
> >> > is
> >> > >>>> this the correct understanding?
> >> > >>>>
> >> > >>>> Thanks,
> >> > >>>> Christian
> >> > >>>>
> >> > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> >> > >>>>
> >> > >>>>> With 0.7, you can set up inter-cluster replication (
> >> > >>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
> >> ).
> >> > >>>>>
> >> > >>>>> For the future 0.8 release, we are working on intra-cluster
> >> > replication
> >> > >>>>> support and details can be found at
> >> > >>>>> https://issues.apache.org/jira/browse/KAFKA-50
> >> > >>>>>
> >> > >>>>> Thanks,
> >> > >>>>>
> >> > >>>>> Jun
> >> > >>>>>
> >> > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> >> > ccarollo@gmail.com
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>> I am looking to implement Kafka in a production environment,
> >> > however,
> >> > >> I
> >> > >>>>>> haven't found in documentation or examples that
> >> > >>>>>> discuss how to build a redundant implementation.  Is there any
> >> > >>>>>> documentation out their (blogs, articles, etc.) that describes
> >> > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> >> > >>>>>>
> >> > >>>>>> Also, is there a timeframe the community is shooting for, to
> >> release
> >> > >>>> 0.8 w/
> >> > >>>>>> replication?
> >> > >>>>>>
> >> > >>>>>> Thanks
> >> > >>>>>> Christian
> >> > >>>>>>
> >> > >>>>
> >> > >>>>
> >> > >>
> >> >
> >> >
> >>
> >
> >
>

Re: Kafka/ZK Cluster Example

Posted by Felix GV <fe...@mate1inc.com>.

We've been thinking about this stuff a lot recently, at work.

We've had some HD failures in our Kafka cluster. I don't know all the
details, but from what I heard, the HDs were mirrored in RAID but several
of them failed in a close time interval and the array did not have time to
fully rebuild itself, so we lost all of that data from the Kafka cluster.
Thankfully, the data was being consumed in near real time, so we only
really lost a small unconsumed window of data.

Now, we're wondering what we could improve to prevent this scenario in the
future. I investigated Kafka mirroring but since it relies on consuming
data, the probability to lose the unconsumed window is still there. If we
had consumers that were more batch oriented (like hadoop) rather than
real-time, the benefits of a mirrored Kafka cluster would be greater, but
for our use cases, where data is consumed near real-time, we would still
lose as much data as before. Am I right?

KAFKA-50, with sync replication would have solved our problem, but until
that's done, what are our options?

I came to the conclusion that simply adding more mirrored copies in our
RAID arrays would be the most cost-effective way to give us both more
availability and more redundancy. This doesn't deal with the scenario where
a machine fails and becomes unavailable, in which case the data on it would
be temporarily unavailable but not lost (although, again, there could be a
small window of uncommited data). However, in terms of protection against
data loss from HD failures, it seems like the best option for now, no?

It doesn't feel right to just throw more hardware at problems hehe... but I
guess sometimes it's the only choice :) ...

Please tell me if that makes sense!

--
Felix



On Wed, Jan 11, 2012 at 6:16 PM, Felix GV <fe...@mate1inc.com> wrote:

> As I understand it, you cannot use a mirrored Kafka cluster as a hot
> fail-over.
>
> You could probably use it as a manual fail-over, but I don't know the
> complexity involved in doing that.
>
> Also, if your source cluster fails while producers were putting data into
> it, there will be an "unconsumed window" of data that is lost. This
> corresponds to the data that the embedded consumer in the mirrored cluster
> did not have time to consume from the source cluster.
>
> All in all, the mirrored cluster is akin to asynchronous replication,
> without any hot fail-over capability. Thus, it provides data redundancy
> (outside of the unconsumed window described above) but no extra
> availability (unless you count manual interventions).
>
> KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> hand, will provide both asynchronous AND synchronous replication (although
> the latter will incur a latency penalty) and will be able to use the
> replicas (data redundancy) as hot-fail overs.
>
> Depending on your personal definition of "highly reliable" (whether it
> includes data redundancy and/or availability), I think that should probably
> answer your question...?
>
> To all the Kafka experts: please correct me if the above explanations are
> incorrect :) !
>
> --
> Felix
>
>
>
>
> On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> It's just that the mirroring logic depends on ZK to be available most of
>> the time.
>>
>> Jun
>>
>> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
>> >wrote:
>>
>> > I see.  But if I used that configuration and then did the mirroring you
>> > suggested would that be enough, in your opinion, to be considered highly
>> > reliable?
>> >
>> > Christian
>> >
>> >
>> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
>> >
>> > >> For example, can I have one ZK instance and one broker on one machine
>> > and
>> > > that is enough to define a ZK cluster and a Kafka Cluster?
>> > >
>> > > Yes, although you don't get the reliability of ZK now.
>> > >
>> > > Jun
>> > >
>> > >
>> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
>> ccarollo@gmail.com
>> > >wrote:
>> > >
>> > >> Jun,
>> > >>
>> > >> I don't think I ask my question the right way.
>> > >>
>> > >> What I am trying to understand is what are the minimum constituent
>> parts
>> > >> of a kafka cluster?
>> > >>
>> > >> Based on your last email, I am now wondering what are the minimum
>> > >> constituent parts of a ZK cluster as well as a Kafka cluster?
>> > >>
>> > >> For example, can I have one ZK instance and one broker on one machine
>> > and
>> > >> that is enough to define a ZK cluster and a Kafka Cluster?
>> > >>
>> > >> Thanks,
>> > >> Christian
>> > >>
>> > >>
>> > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
>> > >>
>> > >>> Chrsitan,
>> > >>>
>> > >>> A Kafka cluster containers a ZK cluster and a list of brokers. When
>> a
>> > >>> consumer subscribes to a topic in a kafka cluster, it consumes data
>> > >> stored
>> > >>> in all brokers in that cluster.
>> > >>>
>> > >>> Thanks,
>> > >>>
>> > >>> Jun
>> > >>>
>> > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
>> > ccarollo@gmail.com
>> > >>> wrote:
>> > >>>
>> > >>>> Thank you Jun that is quite helpful.  I have a question about Kafka
>> > >>>> Clusters.  What are the minimum number and types of services that
>> must
>> > >> be
>> > >>>> running to make up a Kafka Cluster?
>> > >>>>
>> > >>>> I ask this because the diagrams (in the Kafka Mirroring document)
>> > allude
>> > >>>> to a multiple broker environment, however, since each broker does
>> not
>> > >>>> appear to provide redundancy (as of today) to any of the other
>> brokers
>> > >> in a
>> > >>>> given zookeeper service, it seems like a Kafka Cluster is nothing
>> more
>> > >> than
>> > >>>> a grouping of a single zookeeper instance with a single Kafka
>> broker,
>> > is
>> > >>>> this the correct understanding?
>> > >>>>
>> > >>>> Thanks,
>> > >>>> Christian
>> > >>>>
>> > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
>> > >>>>
>> > >>>>> With 0.7, you can set up inter-cluster replication (
>> > >>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
>> ).
>> > >>>>>
>> > >>>>> For the future 0.8 release, we are working on intra-cluster
>> > replication
>> > >>>>> support and details can be found at
>> > >>>>> https://issues.apache.org/jira/browse/KAFKA-50
>> > >>>>>
>> > >>>>> Thanks,
>> > >>>>>
>> > >>>>> Jun
>> > >>>>>
>> > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
>> > ccarollo@gmail.com
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> I am looking to implement Kafka in a production environment,
>> > however,
>> > >> I
>> > >>>>>> haven't found in documentation or examples that
>> > >>>>>> discuss how to build a redundant implementation.  Is there any
>> > >>>>>> documentation out their (blogs, articles, etc.) that describes
>> > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
>> > >>>>>>
>> > >>>>>> Also, is there a timeframe the community is shooting for, to
>> release
>> > >>>> 0.8 w/
>> > >>>>>> replication?
>> > >>>>>>
>> > >>>>>> Thanks
>> > >>>>>> Christian
>> > >>>>>>
>> > >>>>
>> > >>>>
>> > >>
>> >
>> >
>>
>
>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

Felix,

That's a pretty accurate explanation. Thanks,

Jun

On Wed, Jan 11, 2012 at 3:16 PM, Felix GV <fe...@mate1inc.com> wrote:

> As I understand it, you cannot use a mirrored Kafka cluster as a hot
> fail-over.
>
> You could probably use it as a manual fail-over, but I don't know the
> complexity involved in doing that.
>
> Also, if your source cluster fails while producers were putting data into
> it, there will be an "unconsumed window" of data that is lost. This
> corresponds to the data that the embedded consumer in the mirrored cluster
> did not have time to consume from the source cluster.
>
> All in all, the mirrored cluster is akin to asynchronous replication,
> without any hot fail-over capability. Thus, it provides data redundancy
> (outside of the unconsumed window described above) but no extra
> availability (unless you count manual interventions).
>
> KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
> hand, will provide both asynchronous AND synchronous replication (although
> the latter will incur a latency penalty) and will be able to use the
> replicas (data redundancy) as hot-fail overs.
>
> Depending on your personal definition of "highly reliable" (whether it
> includes data redundancy and/or availability), I think that should probably
> answer your question...?
>
> To all the Kafka experts: please correct me if the above explanations are
> incorrect :) !
>
> --
> Felix
>
>
>
> On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > It's just that the mirroring logic depends on ZK to be available most of
> > the time.
> >
> > Jun
> >
> > On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
> > >wrote:
> >
> > > I see.  But if I used that configuration and then did the mirroring you
> > > suggested would that be enough, in your opinion, to be considered
> highly
> > > reliable?
> > >
> > > Christian
> > >
> > >
> > > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
> > >
> > > >> For example, can I have one ZK instance and one broker on one
> machine
> > > and
> > > > that is enough to define a ZK cluster and a Kafka Cluster?
> > > >
> > > > Yes, although you don't get the reliability of ZK now.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <
> ccarollo@gmail.com
> > > >wrote:
> > > >
> > > >> Jun,
> > > >>
> > > >> I don't think I ask my question the right way.
> > > >>
> > > >> What I am trying to understand is what are the minimum constituent
> > parts
> > > >> of a kafka cluster?
> > > >>
> > > >> Based on your last email, I am now wondering what are the minimum
> > > >> constituent parts of a ZK cluster as well as a Kafka cluster?
> > > >>
> > > >> For example, can I have one ZK instance and one broker on one
> machine
> > > and
> > > >> that is enough to define a ZK cluster and a Kafka Cluster?
> > > >>
> > > >> Thanks,
> > > >> Christian
> > > >>
> > > >>
> > > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >>
> > > >>> Chrsitan,
> > > >>>
> > > >>> A Kafka cluster containers a ZK cluster and a list of brokers.
> When a
> > > >>> consumer subscribes to a topic in a kafka cluster, it consumes data
> > > >> stored
> > > >>> in all brokers in that cluster.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Jun
> > > >>>
> > > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> > > ccarollo@gmail.com
> > > >>> wrote:
> > > >>>
> > > >>>> Thank you Jun that is quite helpful.  I have a question about
> Kafka
> > > >>>> Clusters.  What are the minimum number and types of services that
> > must
> > > >> be
> > > >>>> running to make up a Kafka Cluster?
> > > >>>>
> > > >>>> I ask this because the diagrams (in the Kafka Mirroring document)
> > > allude
> > > >>>> to a multiple broker environment, however, since each broker does
> > not
> > > >>>> appear to provide redundancy (as of today) to any of the other
> > brokers
> > > >> in a
> > > >>>> given zookeeper service, it seems like a Kafka Cluster is nothing
> > more
> > > >> than
> > > >>>> a grouping of a single zookeeper instance with a single Kafka
> > broker,
> > > is
> > > >>>> this the correct understanding?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Christian
> > > >>>>
> > > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> > > >>>>
> > > >>>>> With 0.7, you can set up inter-cluster replication (
> > > >>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
> > ).
> > > >>>>>
> > > >>>>> For the future 0.8 release, we are working on intra-cluster
> > > replication
> > > >>>>> support and details can be found at
> > > >>>>> https://issues.apache.org/jira/browse/KAFKA-50
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>>
> > > >>>>> Jun
> > > >>>>>
> > > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> > > ccarollo@gmail.com
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> I am looking to implement Kafka in a production environment,
> > > however,
> > > >> I
> > > >>>>>> haven't found in documentation or examples that
> > > >>>>>> discuss how to build a redundant implementation.  Is there any
> > > >>>>>> documentation out their (blogs, articles, etc.) that describes
> > > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> > > >>>>>>
> > > >>>>>> Also, is there a timeframe the community is shooting for, to
> > release
> > > >>>> 0.8 w/
> > > >>>>>> replication?
> > > >>>>>>
> > > >>>>>> Thanks
> > > >>>>>> Christian
> > > >>>>>>
> > > >>>>
> > > >>>>
> > > >>
> > >
> > >
> >
>

Re: Kafka/ZK Cluster Example

Posted by Felix GV <fe...@mate1inc.com>.

As I understand it, you cannot use a mirrored Kafka cluster as a hot
fail-over.

You could probably use it as a manual fail-over, but I don't know the
complexity involved in doing that.

Also, if your source cluster fails while producers were putting data into
it, there will be an "unconsumed window" of data that is lost. This
corresponds to the data that the embedded consumer in the mirrored cluster
did not have time to consume from the source cluster.

All in all, the mirrored cluster is akin to asynchronous replication,
without any hot fail-over capability. Thus, it provides data redundancy
(outside of the unconsumed window described above) but no extra
availability (unless you count manual interventions).

KAFKA-50 <https://issues.apache.org/jira/browse/KAFKA-50>, on the other
hand, will provide both asynchronous AND synchronous replication (although
the latter will incur a latency penalty) and will be able to use the
replicas (data redundancy) as hot-fail overs.

Depending on your personal definition of "highly reliable" (whether it
includes data redundancy and/or availability), I think that should probably
answer your question...?

To all the Kafka experts: please correct me if the above explanations are
incorrect :) !

--
Felix



On Wed, Jan 11, 2012 at 5:53 PM, Jun Rao <ju...@gmail.com> wrote:

> It's just that the mirroring logic depends on ZK to be available most of
> the time.
>
> Jun
>
> On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <ccarollo@gmail.com
> >wrote:
>
> > I see.  But if I used that configuration and then did the mirroring you
> > suggested would that be enough, in your opinion, to be considered highly
> > reliable?
> >
> > Christian
> >
> >
> > On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
> >
> > >> For example, can I have one ZK instance and one broker on one machine
> > and
> > > that is enough to define a ZK cluster and a Kafka Cluster?
> > >
> > > Yes, although you don't get the reliability of ZK now.
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <ccarollo@gmail.com
> > >wrote:
> > >
> > >> Jun,
> > >>
> > >> I don't think I ask my question the right way.
> > >>
> > >> What I am trying to understand is what are the minimum constituent
> parts
> > >> of a kafka cluster?
> > >>
> > >> Based on your last email, I am now wondering what are the minimum
> > >> constituent parts of a ZK cluster as well as a Kafka cluster?
> > >>
> > >> For example, can I have one ZK instance and one broker on one machine
> > and
> > >> that is enough to define a ZK cluster and a Kafka Cluster?
> > >>
> > >> Thanks,
> > >> Christian
> > >>
> > >>
> > >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> > >>
> > >>> Chrsitan,
> > >>>
> > >>> A Kafka cluster containers a ZK cluster and a list of brokers. When a
> > >>> consumer subscribes to a topic in a kafka cluster, it consumes data
> > >> stored
> > >>> in all brokers in that cluster.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jun
> > >>>
> > >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> > ccarollo@gmail.com
> > >>> wrote:
> > >>>
> > >>>> Thank you Jun that is quite helpful.  I have a question about Kafka
> > >>>> Clusters.  What are the minimum number and types of services that
> must
> > >> be
> > >>>> running to make up a Kafka Cluster?
> > >>>>
> > >>>> I ask this because the diagrams (in the Kafka Mirroring document)
> > allude
> > >>>> to a multiple broker environment, however, since each broker does
> not
> > >>>> appear to provide redundancy (as of today) to any of the other
> brokers
> > >> in a
> > >>>> given zookeeper service, it seems like a Kafka Cluster is nothing
> more
> > >> than
> > >>>> a grouping of a single zookeeper instance with a single Kafka
> broker,
> > is
> > >>>> this the correct understanding?
> > >>>>
> > >>>> Thanks,
> > >>>> Christian
> > >>>>
> > >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> > >>>>
> > >>>>> With 0.7, you can set up inter-cluster replication (
> > >>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring
> ).
> > >>>>>
> > >>>>> For the future 0.8 release, we are working on intra-cluster
> > replication
> > >>>>> support and details can be found at
> > >>>>> https://issues.apache.org/jira/browse/KAFKA-50
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> Jun
> > >>>>>
> > >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> > ccarollo@gmail.com
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I am looking to implement Kafka in a production environment,
> > however,
> > >> I
> > >>>>>> haven't found in documentation or examples that
> > >>>>>> discuss how to build a redundant implementation.  Is there any
> > >>>>>> documentation out their (blogs, articles, etc.) that describes
> > >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> > >>>>>>
> > >>>>>> Also, is there a timeframe the community is shooting for, to
> release
> > >>>> 0.8 w/
> > >>>>>> replication?
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>> Christian
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> >
> >
>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

It's just that the mirroring logic depends on ZK to be available most of
the time.

Jun

On Wed, Jan 11, 2012 at 2:35 PM, Christian Carollo <cc...@gmail.com>wrote:

> I see.  But if I used that configuration and then did the mirroring you
> suggested would that be enough, in your opinion, to be considered highly
> reliable?
>
> Christian
>
>
> On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:
>
> >> For example, can I have one ZK instance and one broker on one machine
> and
> > that is enough to define a ZK cluster and a Kafka Cluster?
> >
> > Yes, although you don't get the reliability of ZK now.
> >
> > Jun
> >
> >
> > On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <ccarollo@gmail.com
> >wrote:
> >
> >> Jun,
> >>
> >> I don't think I ask my question the right way.
> >>
> >> What I am trying to understand is what are the minimum constituent parts
> >> of a kafka cluster?
> >>
> >> Based on your last email, I am now wondering what are the minimum
> >> constituent parts of a ZK cluster as well as a Kafka cluster?
> >>
> >> For example, can I have one ZK instance and one broker on one machine
> and
> >> that is enough to define a ZK cluster and a Kafka Cluster?
> >>
> >> Thanks,
> >> Christian
> >>
> >>
> >> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
> >>
> >>> Chrsitan,
> >>>
> >>> A Kafka cluster containers a ZK cluster and a list of brokers. When a
> >>> consumer subscribes to a topic in a kafka cluster, it consumes data
> >> stored
> >>> in all brokers in that cluster.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <
> ccarollo@gmail.com
> >>> wrote:
> >>>
> >>>> Thank you Jun that is quite helpful.  I have a question about Kafka
> >>>> Clusters.  What are the minimum number and types of services that must
> >> be
> >>>> running to make up a Kafka Cluster?
> >>>>
> >>>> I ask this because the diagrams (in the Kafka Mirroring document)
> allude
> >>>> to a multiple broker environment, however, since each broker does not
> >>>> appear to provide redundancy (as of today) to any of the other brokers
> >> in a
> >>>> given zookeeper service, it seems like a Kafka Cluster is nothing more
> >> than
> >>>> a grouping of a single zookeeper instance with a single Kafka broker,
> is
> >>>> this the correct understanding?
> >>>>
> >>>> Thanks,
> >>>> Christian
> >>>>
> >>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> >>>>
> >>>>> With 0.7, you can set up inter-cluster replication (
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
> >>>>>
> >>>>> For the future 0.8 release, we are working on intra-cluster
> replication
> >>>>> support and details can be found at
> >>>>> https://issues.apache.org/jira/browse/KAFKA-50
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <
> ccarollo@gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>> I am looking to implement Kafka in a production environment,
> however,
> >> I
> >>>>>> haven't found in documentation or examples that
> >>>>>> discuss how to build a redundant implementation.  Is there any
> >>>>>> documentation out their (blogs, articles, etc.) that describes
> >>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
> >>>>>>
> >>>>>> Also, is there a timeframe the community is shooting for, to release
> >>>> 0.8 w/
> >>>>>> replication?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Christian
> >>>>>>
> >>>>
> >>>>
> >>
>
>

Re: Kafka/ZK Cluster Example

Posted by Christian Carollo <cc...@gmail.com>.

I see.  But if I used that configuration and then did the mirroring you suggested would that be enough, in your opinion, to be considered highly reliable?

Christian


On Jan 11, 2012, at 2:32 PM, Jun Rao wrote:

>> For example, can I have one ZK instance and one broker on one machine and
> that is enough to define a ZK cluster and a Kafka Cluster?
> 
> Yes, although you don't get the reliability of ZK now.
> 
> Jun
> 
> 
> On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <cc...@gmail.com>wrote:
> 
>> Jun,
>> 
>> I don't think I ask my question the right way.
>> 
>> What I am trying to understand is what are the minimum constituent parts
>> of a kafka cluster?
>> 
>> Based on your last email, I am now wondering what are the minimum
>> constituent parts of a ZK cluster as well as a Kafka cluster?
>> 
>> For example, can I have one ZK instance and one broker on one machine and
>> that is enough to define a ZK cluster and a Kafka Cluster?
>> 
>> Thanks,
>> Christian
>> 
>> 
>> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
>> 
>>> Chrsitan,
>>> 
>>> A Kafka cluster containers a ZK cluster and a list of brokers. When a
>>> consumer subscribes to a topic in a kafka cluster, it consumes data
>> stored
>>> in all brokers in that cluster.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <ccarollo@gmail.com
>>> wrote:
>>> 
>>>> Thank you Jun that is quite helpful.  I have a question about Kafka
>>>> Clusters.  What are the minimum number and types of services that must
>> be
>>>> running to make up a Kafka Cluster?
>>>> 
>>>> I ask this because the diagrams (in the Kafka Mirroring document) allude
>>>> to a multiple broker environment, however, since each broker does not
>>>> appear to provide redundancy (as of today) to any of the other brokers
>> in a
>>>> given zookeeper service, it seems like a Kafka Cluster is nothing more
>> than
>>>> a grouping of a single zookeeper instance with a single Kafka broker, is
>>>> this the correct understanding?
>>>> 
>>>> Thanks,
>>>> Christian
>>>> 
>>>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
>>>> 
>>>>> With 0.7, you can set up inter-cluster replication (
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
>>>>> 
>>>>> For the future 0.8 release, we are working on intra-cluster replication
>>>>> support and details can be found at
>>>>> https://issues.apache.org/jira/browse/KAFKA-50
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <ccarollo@gmail.com
>>>>> wrote:
>>>>> 
>>>>>> I am looking to implement Kafka in a production environment, however,
>> I
>>>>>> haven't found in documentation or examples that
>>>>>> discuss how to build a redundant implementation.  Is there any
>>>>>> documentation out their (blogs, articles, etc.) that describes
>>>>>> how we can implement such a system with Kafka 0.6 or 0.7.
>>>>>> 
>>>>>> Also, is there a timeframe the community is shooting for, to release
>>>> 0.8 w/
>>>>>> replication?
>>>>>> 
>>>>>> Thanks
>>>>>> Christian
>>>>>> 
>>>> 
>>>> 
>>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

> For example, can I have one ZK instance and one broker on one machine and
that is enough to define a ZK cluster and a Kafka Cluster?

Yes, although you don't get the reliability of ZK now.

Jun


On Wed, Jan 11, 2012 at 2:06 PM, Christian Carollo <cc...@gmail.com>wrote:

> Jun,
>
> I don't think I ask my question the right way.
>
> What I am trying to understand is what are the minimum constituent parts
> of a kafka cluster?
>
> Based on your last email, I am now wondering what are the minimum
> constituent parts of a ZK cluster as well as a Kafka cluster?
>
> For example, can I have one ZK instance and one broker on one machine and
> that is enough to define a ZK cluster and a Kafka Cluster?
>
> Thanks,
> Christian
>
>
> On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Chrsitan,
> >
> > A Kafka cluster containers a ZK cluster and a list of brokers. When a
> > consumer subscribes to a topic in a kafka cluster, it consumes data
> stored
> > in all brokers in that cluster.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <ccarollo@gmail.com
> >wrote:
> >
> >> Thank you Jun that is quite helpful.  I have a question about Kafka
> >> Clusters.  What are the minimum number and types of services that must
> be
> >> running to make up a Kafka Cluster?
> >>
> >> I ask this because the diagrams (in the Kafka Mirroring document) allude
> >> to a multiple broker environment, however, since each broker does not
> >> appear to provide redundancy (as of today) to any of the other brokers
> in a
> >> given zookeeper service, it seems like a Kafka Cluster is nothing more
> than
> >> a grouping of a single zookeeper instance with a single Kafka broker, is
> >> this the correct understanding?
> >>
> >> Thanks,
> >> Christian
> >>
> >> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
> >>
> >>> With 0.7, you can set up inter-cluster replication (
> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
> >>>
> >>> For the future 0.8 release, we are working on intra-cluster replication
> >>> support and details can be found at
> >>> https://issues.apache.org/jira/browse/KAFKA-50
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <ccarollo@gmail.com
> >>> wrote:
> >>>
> >>>> I am looking to implement Kafka in a production environment, however,
> I
> >>>> haven't found in documentation or examples that
> >>>> discuss how to build a redundant implementation.  Is there any
> >>>> documentation out their (blogs, articles, etc.) that describes
> >>>> how we can implement such a system with Kafka 0.6 or 0.7.
> >>>>
> >>>> Also, is there a timeframe the community is shooting for, to release
> >> 0.8 w/
> >>>> replication?
> >>>>
> >>>> Thanks
> >>>> Christian
> >>>>
> >>
> >>
>

Re: Kafka/ZK Cluster Example

Posted by Christian Carollo <cc...@gmail.com>.

Jun,

I don't think I ask my question the right way. 

What I am trying to understand is what are the minimum constituent parts of a kafka cluster?

Based on your last email, I am now wondering what are the minimum constituent parts of a ZK cluster as well as a Kafka cluster?

For example, can I have one ZK instance and one broker on one machine and that is enough to define a ZK cluster and a Kafka Cluster?

Thanks,
Christian


On Jan 11, 2012, at 1:50 PM, Jun Rao <ju...@gmail.com> wrote:

> Chrsitan,
> 
> A Kafka cluster containers a ZK cluster and a list of brokers. When a
> consumer subscribes to a topic in a kafka cluster, it consumes data stored
> in all brokers in that cluster.
> 
> Thanks,
> 
> Jun
> 
> On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <cc...@gmail.com>wrote:
> 
>> Thank you Jun that is quite helpful.  I have a question about Kafka
>> Clusters.  What are the minimum number and types of services that must be
>> running to make up a Kafka Cluster?
>> 
>> I ask this because the diagrams (in the Kafka Mirroring document) allude
>> to a multiple broker environment, however, since each broker does not
>> appear to provide redundancy (as of today) to any of the other brokers in a
>> given zookeeper service, it seems like a Kafka Cluster is nothing more than
>> a grouping of a single zookeeper instance with a single Kafka broker, is
>> this the correct understanding?
>> 
>> Thanks,
>> Christian
>> 
>> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
>> 
>>> With 0.7, you can set up inter-cluster replication (
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
>>> 
>>> For the future 0.8 release, we are working on intra-cluster replication
>>> support and details can be found at
>>> https://issues.apache.org/jira/browse/KAFKA-50
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <ccarollo@gmail.com
>>> wrote:
>>> 
>>>> I am looking to implement Kafka in a production environment, however, I
>>>> haven't found in documentation or examples that
>>>> discuss how to build a redundant implementation.  Is there any
>>>> documentation out their (blogs, articles, etc.) that describes
>>>> how we can implement such a system with Kafka 0.6 or 0.7.
>>>> 
>>>> Also, is there a timeframe the community is shooting for, to release
>> 0.8 w/
>>>> replication?
>>>> 
>>>> Thanks
>>>> Christian
>>>> 
>> 
>>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

Chrsitan,

A Kafka cluster containers a ZK cluster and a list of brokers. When a
consumer subscribes to a topic in a kafka cluster, it consumes data stored
in all brokers in that cluster.

Thanks,

Jun

On Tue, Jan 10, 2012 at 11:28 PM, Christian Carollo <cc...@gmail.com>wrote:

> Thank you Jun that is quite helpful.  I have a question about Kafka
> Clusters.  What are the minimum number and types of services that must be
> running to make up a Kafka Cluster?
>
> I ask this because the diagrams (in the Kafka Mirroring document) allude
> to a multiple broker environment, however, since each broker does not
> appear to provide redundancy (as of today) to any of the other brokers in a
> given zookeeper service, it seems like a Kafka Cluster is nothing more than
> a grouping of a single zookeeper instance with a single Kafka broker, is
> this the correct understanding?
>
> Thanks,
> Christian
>
> On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:
>
> > With 0.7, you can set up inter-cluster replication (
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
> >
> > For the future 0.8 release, we are working on intra-cluster replication
> > support and details can be found at
> > https://issues.apache.org/jira/browse/KAFKA-50
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <ccarollo@gmail.com
> >wrote:
> >
> >> I am looking to implement Kafka in a production environment, however, I
> >> haven't found in documentation or examples that
> >> discuss how to build a redundant implementation.  Is there any
> >> documentation out their (blogs, articles, etc.) that describes
> >> how we can implement such a system with Kafka 0.6 or 0.7.
> >>
> >> Also, is there a timeframe the community is shooting for, to release
> 0.8 w/
> >> replication?
> >>
> >> Thanks
> >> Christian
> >>
>
>

Re: Kafka/ZK Cluster Example

Posted by Christian Carollo <cc...@gmail.com>.

Thank you Jun that is quite helpful.  I have a question about Kafka Clusters.  What are the minimum number and types of services that must be running to make up a Kafka Cluster?

I ask this because the diagrams (in the Kafka Mirroring document) allude to a multiple broker environment, however, since each broker does not appear to provide redundancy (as of today) to any of the other brokers in a given zookeeper service, it seems like a Kafka Cluster is nothing more than a grouping of a single zookeeper instance with a single Kafka broker, is this the correct understanding?

Thanks,
Christian

On Jan 10, 2012, at 8:47 AM, Jun Rao wrote:

> With 0.7, you can set up inter-cluster replication (
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).
> 
> For the future 0.8 release, we are working on intra-cluster replication
> support and details can be found at
> https://issues.apache.org/jira/browse/KAFKA-50
> 
> Thanks,
> 
> Jun
> 
> On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <cc...@gmail.com>wrote:
> 
>> I am looking to implement Kafka in a production environment, however, I
>> haven't found in documentation or examples that
>> discuss how to build a redundant implementation.  Is there any
>> documentation out their (blogs, articles, etc.) that describes
>> how we can implement such a system with Kafka 0.6 or 0.7.
>> 
>> Also, is there a timeframe the community is shooting for, to release 0.8 w/
>> replication?
>> 
>> Thanks
>> Christian
>>

Re: Kafka/ZK Cluster Example

Posted by Jun Rao <ju...@gmail.com>.

With 0.7, you can set up inter-cluster replication (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring).

For the future 0.8 release, we are working on intra-cluster replication
support and details can be found at
https://issues.apache.org/jira/browse/KAFKA-50

Thanks,

Jun

On Mon, Jan 9, 2012 at 9:52 PM, Christian Carollo <cc...@gmail.com>wrote:

> I am looking to implement Kafka in a production environment, however, I
> haven't found in documentation or examples that
> discuss how to build a redundant implementation.  Is there any
> documentation out their (blogs, articles, etc.) that describes
> how we can implement such a system with Kafka 0.6 or 0.7.
>
> Also, is there a timeframe the community is shooting for, to release 0.8 w/
> replication?
>
> Thanks
> Christian
>