You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bhavesh Mistry <mi...@gmail.com> on 2014/03/07 20:39:00 UTC

Apache Kafka Use Case at WalmartLabs

 We are planning to use Apache Kafka to replace Apache Fume for mostly as
log transport layer.  Please see the attached image which is similar use
case ( and deployment architecture ) at Linkedin (according to
http://sites.computer.org/debull/A12june/pipeline.pdf ).     I have
following questions: 1) We will be creating dynamic topic to publish
messages from  frond end and back-end servers producers.   How can we
discovers new topics so consumer can pull the data from Kafka Broker
clusters to HDFS ? 2) Is there a topic priority available when system in
under heavy load ?  For example,  during the holiday traffic we might get
more traffic which will cause more events to be published...so is there any
way to configure topic have higher priority and should not suffer the rate
of through-put for that particular topic.  3) When using Kafka Mirror Maker
for replicating messages from Local Datacenter to centralized Kafka broker
cluster?  Does it also replicate offset consumed by particular consumer ?
Basically, from the centralized Kafka Brokers,  we wanted to re-read the
message from beginning to input into the hadoop. 5) Also, I would like to
contribute to the Kafka Development so please let me know which dev feature
or bugs we can fix to get started.   I have already joined dev group of
Kafka.
Thanks,Bhavesh

Re: Apache Kafka Use Case at WalmartLabs

Posted by Neha Narkhede <ne...@gmail.com>.
In addition to what Guozhang said -

1) Since you are looking into Camus, this is probably a question for the
Camus mailing list. I believe it does automatically detect new topics.
2) There is no priority and we intend to solve the traffic spike problems
through quotas. But usually in most cases, if the traffic spike is not
prohibitively high, you can absorb it by allocating enough headroom on the
brokers.
5) Take a look at these newbie
JIRAs<https://issues.apache.org/jira/browse/KAFKA-1273?jql=labels%20%3D%20newbie%20and%20project%20%3D%20KAFKA>or
usability
JIRAs<https://issues.apache.org/jira/browse/KAFKA-1273?jql=labels%20%3D%20usability%20and%20project%20%3D%20KAFKA>.

Thanks,
Neha


On Fri, Mar 7, 2014 at 1:14 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Bhavesh,
>
> 1) If auto.create.topics.enable is turned on and consumer is subscribing to
> a wildcard topic, then producers can just send to new topics on the fly
> which can be then captured by the consumers.
>
> 2) For now we do not have priority mechanism, but we do have some initial
> plans on quotas, you can find some details here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KAFKA-656+-+Quota+Design
>
> 3) MM does not preserve the offset across clusters now. What you can do is
> to have a separate consumer group in the centralized cluster which will not
> share any messages with the local consumers.
>
> 5) Thanks! You can start by searching for JIRAs tagged with "newbie".
>
> Guozhang
>
>
> On Fri, Mar 7, 2014 at 11:39 AM, Bhavesh Mistry
> <mi...@gmail.com>wrote:
>
> > We are planning to use Apache Kafka to replace Apache Fume for mostly as
> > log transport layer.  Please see the attached image which is similar use
> > case ( and deployment architecture ) at Linkedin (according to
> > http://sites.computer.org/debull/A12june/pipeline.pdf ).     I have
> > following questions: 1) We will be creating dynamic topic to publish
> > messages from  frond end and back-end servers producers.   How can we
> > discovers new topics so consumer can pull the data from Kafka Broker
> > clusters to HDFS ? 2) Is there a topic priority available when system in
> > under heavy load ?  For example,  during the holiday traffic we might get
> > more traffic which will cause more events to be published...so is there
> any
> > way to configure topic have higher priority and should not suffer the
> rate
> > of through-put for that particular topic.  3) When using Kafka Mirror
> > Maker for replicating messages from Local Datacenter to centralized Kafka
> > broker cluster?  Does it also replicate offset consumed by particular
> > consumer ?  Basically, from the centralized Kafka Brokers,  we wanted to
> > re-read the message from beginning to input into the hadoop. 5) Also, I
> > would like to contribute to the Kafka Development so please let me know
> > which dev feature or bugs we can fix to get started.   I have already
> > joined dev group of Kafka.
> > Thanks,Bhavesh
> >
>
>
>
> --
> -- Guozhang
>

Re: Apache Kafka Use Case at WalmartLabs

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Bhavesh,

1) If auto.create.topics.enable is turned on and consumer is subscribing to
a wildcard topic, then producers can just send to new topics on the fly
which can be then captured by the consumers.

2) For now we do not have priority mechanism, but we do have some initial
plans on quotas, you can find some details here:

https://cwiki.apache.org/confluence/display/KAFKA/KAFKA-656+-+Quota+Design

3) MM does not preserve the offset across clusters now. What you can do is
to have a separate consumer group in the centralized cluster which will not
share any messages with the local consumers.

5) Thanks! You can start by searching for JIRAs tagged with "newbie".

Guozhang


On Fri, Mar 7, 2014 at 11:39 AM, Bhavesh Mistry
<mi...@gmail.com>wrote:

> We are planning to use Apache Kafka to replace Apache Fume for mostly as
> log transport layer.  Please see the attached image which is similar use
> case ( and deployment architecture ) at Linkedin (according to
> http://sites.computer.org/debull/A12june/pipeline.pdf ).     I have
> following questions: 1) We will be creating dynamic topic to publish
> messages from  frond end and back-end servers producers.   How can we
> discovers new topics so consumer can pull the data from Kafka Broker
> clusters to HDFS ? 2) Is there a topic priority available when system in
> under heavy load ?  For example,  during the holiday traffic we might get
> more traffic which will cause more events to be published...so is there any
> way to configure topic have higher priority and should not suffer the rate
> of through-put for that particular topic.  3) When using Kafka Mirror
> Maker for replicating messages from Local Datacenter to centralized Kafka
> broker cluster?  Does it also replicate offset consumed by particular
> consumer ?  Basically, from the centralized Kafka Brokers,  we wanted to
> re-read the message from beginning to input into the hadoop. 5) Also, I
> would like to contribute to the Kafka Development so please let me know
> which dev feature or bugs we can fix to get started.   I have already
> joined dev group of Kafka.
> Thanks,Bhavesh
>



-- 
-- Guozhang