You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by masoom alam <ma...@imsciences.edu.pk> on 2013/08/05 11:44:22 UTC

Re: WELCOME to users@kafka.apache.org

Hi every one,

i am new to kafka and desiging an Event Processing System. Is this possible
that Kafka  Broker can do some event dependency handling so that for
example events of type A only goes to Consumer1.

I hope I was able to explain my problem

Thanks.

Re: WELCOME to users@kafka.apache.org

Posted by masoom alam <ma...@gmail.com>.
Hi Joe,

Many thanks for your response.



> Well your client would have to share the same ZooKeeper (assuming 0.8)
> instances also this is not going to be fantastic over a WAN assuming your
> mention/meaning of "client" is another infrastructure?
>
> If my thought of you having another infrastructure posting to you my
> suggestion would be to front this with a REST based service and have them
> post you some JSON or something and then within your interface public
> facing tier then send/produce that into Kafka.
>
>
*Does this mean that we will override the simple communication protocols
being used by Kafka for communication between producer, broker and
consumer. *

Re: WELCOME to users@kafka.apache.org

Posted by Joe Stein <cr...@gmail.com>.
see inline

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Wed, Aug 7, 2013 at 2:41 AM, masoom alam <ma...@gmail.com> wrote:

> Hi Joe,
>
> Many thanks for such a detailed response.
>
>
> So you would have a topic called "TypeA" and then setup a consumer group
> > and those consumers (if you really only needed 1 consumer set the
> > partitions to 1) would get everything from the "TypeA" topic.  If you had
> > more event types then just setup more topics and then consumers group
> for a
> > consumer for those topics.
> >
> >
> Got it, so we will have multiple topics. Each topic will be logically
> associated with a set of consumers (Consumer Group).


Yup


> I was thinking that
> this approach will be better off, in terms of efficiency by having each
> individual topic associated with each consumer or consumer group.
>

Yup


>
>
>
> > Now depending on what you need to-do you may need topics to not be by
> type
> > perhaps be another value and in that case you can still "pin" data to a
> > consumer in that case use Semantic Partitioning
> > http://kafka.apache.org/design.html
> > *
> >  *Semantic partitioning*
> >
> > "Consider an application that would like to maintain an aggregation of
> the
> > number of profile visitors for each member. It would like to send all
> > profile visit events for a member to a particular partition and, hence,
> > have all updates for a member to appear in the same stream for the same
> > consumer thread. The producer has the capability to be able to
> semantically
> > map messages to the available kafka nodes and partitions. This allows
> > partitioning the stream of messages with some semantic partition function
> > based on some key in the message to spread them over broker machines. The
> > partitioning function can be customized by providing an implementation of
> > the kafka.producer.Partitioner interface, default being the random
> > partitioner. For the example above, the key would be member_id and the
> > partitioning function would be hash(member_id)%num_partitions."
> >
> > If I am getting you correctly, the responsibility of mapping of events
> will be on the shoulders of Producers right?.


Well, not exactly it would be under the hood the event name just gets
passed into the key the actual mapping action of which partition it goes to
is handled by Kafka


> What if, we want to have a
> function at the Kafka brokers nodes which actually performs the mapping. I
> mean from the Producer side, if we want to make it transparent


It is transparent the only change on the producer side is instead of
KeyedMessage("topicname",message) you do
KeyedMessage("topicname",message,"eventName") and then have the configs
setup right


> which event
> will go to which consumer. Actually, in our scenario, we will have
> producers at the client end, and brokers and consumers at our end.
>

The client end has to-do some level of effort to integrate so its not much
more than an additional config set and one more param in the constructor


>
> Will this be a feasible approach?


Well your client would have to share the same ZooKeeper (assuming 0.8)
instances also this is not going to be fantastic over a WAN assuming your
mention/meaning of "client" is another infrastructure?

If my thought of you having another infrastructure posting to you my
suggestion would be to front this with a REST based service and have them
post you some JSON or something and then within your interface public
facing tier then send/produce that into Kafka.


> I am also thinking if we can include some
> sort of load balancing at the Kafka broker nodes?


This is handled for you by 0.8 now under the hood


> That is depending on the
> load of the consumers, the brokers writes the events to the respective
> topics set for each consumer.
>
>
>
> Thanks a lot for your time.
>
>
> > Take a look at KeyedMessage.scala, the ConsoleProducer.scala example uses
> > the overloaded constructor but you can use the constructor which makes
> the
> > partitions random by instead passin in the key (the type) and also set
> > your key-serializer to kafka.serializer.StringEncoder or make your own
> >
> > In this case you might need to have a partition for each topic unless you
> > can have a consumer read different event types again, it all depends on
> > your implementation of your event processing system.
> >
> > Hope this helps getting you started some, thanks!
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Mon, Aug 5, 2013 at 5:44 AM, masoom alam
> > <ma...@imsciences.edu.pk>wrote:
> >
> > > Hi every one,
> > >
> > > i am new to kafka and desiging an Event Processing System. Is this
> > possible
> > > that Kafka  Broker can do some event dependency handling so that for
> > > example events of type A only goes to Consumer1.
> > >
> > > I hope I was able to explain my problem
> > >
> > > Thanks.
> > >
> >
>
>
> On Mon, Aug 5, 2013 at 4:58 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > Yes, depending on your implementation of the event processing system you
> > may simply be able to have each topic be the type of event.
> >
> > So you would have a topic called "TypeA" and then setup a consumer group
> > and those consumers (if you really only needed 1 consumer set the
> > partitions to 1) would get everything from the "TypeA" topic.  If you had
> > more event types then just setup more topics and then consumers group
> for a
> > consumer for those topics.
> >
> > Now depending on what you need to-do you may need topics to not be by
> type
> > perhaps be another value and in that case you can still "pin" data to a
> > consumer in that case use Semantic Partitioning
> > http://kafka.apache.org/design.html
> > *
> > *
> > *Semantic partitioning*
> >
> > "Consider an application that would like to maintain an aggregation of
> the
> > number of profile visitors for each member. It would like to send all
> > profile visit events for a member to a particular partition and, hence,
> > have all updates for a member to appear in the same stream for the same
> > consumer thread. The producer has the capability to be able to
> semantically
> > map messages to the available kafka nodes and partitions. This allows
> > partitioning the stream of messages with some semantic partition function
> > based on some key in the message to spread them over broker machines. The
> > partitioning function can be customized by providing an implementation of
> > the kafka.producer.Partitioner interface, default being the random
> > partitioner. For the example above, the key would be member_id and the
> > partitioning function would be hash(member_id)%num_partitions."
> >
> > Take a look at KeyedMessage.scala, the ConsoleProducer.scala example uses
> > the overloaded constructor but you can use the constructor which makes
> the
> > partitions random by instead passin in the key (the type) and also set
> > your key-serializer to kafka.serializer.StringEncoder or make your own
> >
> > In this case you might need to have a partition for each topic unless you
> > can have a consumer read different event types again, it all depends on
> > your implementation of your event processing system.
> >
> > Hope this helps getting you started some, thanks!
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > ********************************************/
> >
> >
> > On Mon, Aug 5, 2013 at 5:44 AM, masoom alam
> > <ma...@imsciences.edu.pk>wrote:
> >
> > > Hi every one,
> > >
> > > i am new to kafka and desiging an Event Processing System. Is this
> > possible
> > > that Kafka  Broker can do some event dependency handling so that for
> > > example events of type A only goes to Consumer1.
> > >
> > > I hope I was able to explain my problem
> > >
> > > Thanks.
> > >
> >
>

Re: WELCOME to users@kafka.apache.org

Posted by masoom alam <ma...@gmail.com>.
Hi Joe,

Many thanks for such a detailed response.


So you would have a topic called "TypeA" and then setup a consumer group
> and those consumers (if you really only needed 1 consumer set the
> partitions to 1) would get everything from the "TypeA" topic.  If you had
> more event types then just setup more topics and then consumers group for a
> consumer for those topics.
>
>
Got it, so we will have multiple topics. Each topic will be logically
associated with a set of consumers (Consumer Group). I was thinking that
this approach will be better off, in terms of efficiency by having each
individual topic associated with each consumer or consumer group.



> Now depending on what you need to-do you may need topics to not be by type
> perhaps be another value and in that case you can still "pin" data to a
> consumer in that case use Semantic Partitioning
> http://kafka.apache.org/design.html
> *
>  *Semantic partitioning*
>
> "Consider an application that would like to maintain an aggregation of the
> number of profile visitors for each member. It would like to send all
> profile visit events for a member to a particular partition and, hence,
> have all updates for a member to appear in the same stream for the same
> consumer thread. The producer has the capability to be able to semantically
> map messages to the available kafka nodes and partitions. This allows
> partitioning the stream of messages with some semantic partition function
> based on some key in the message to spread them over broker machines. The
> partitioning function can be customized by providing an implementation of
> the kafka.producer.Partitioner interface, default being the random
> partitioner. For the example above, the key would be member_id and the
> partitioning function would be hash(member_id)%num_partitions."
>
> If I am getting you correctly, the responsibility of mapping of events
will be on the shoulders of Producers right?. What if, we want to have a
function at the Kafka brokers nodes which actually performs the mapping. I
mean from the Producer side, if we want to make it transparent which event
will go to which consumer. Actually, in our scenario, we will have
producers at the client end, and brokers and consumers at our end.

Will this be a feasible approach? I am also thinking if we can include some
sort of load balancing at the Kafka broker nodes? That is depending on the
load of the consumers, the brokers writes the events to the respective
topics set for each consumer.



Thanks a lot for your time.


> Take a look at KeyedMessage.scala, the ConsoleProducer.scala example uses
> the overloaded constructor but you can use the constructor which makes the
> partitions random by instead passin in the key (the type) and also set
> your key-serializer to kafka.serializer.StringEncoder or make your own
>
> In this case you might need to have a partition for each topic unless you
> can have a consumer read different event types again, it all depends on
> your implementation of your event processing system.
>
> Hope this helps getting you started some, thanks!
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Mon, Aug 5, 2013 at 5:44 AM, masoom alam
> <ma...@imsciences.edu.pk>wrote:
>
> > Hi every one,
> >
> > i am new to kafka and desiging an Event Processing System. Is this
> possible
> > that Kafka  Broker can do some event dependency handling so that for
> > example events of type A only goes to Consumer1.
> >
> > I hope I was able to explain my problem
> >
> > Thanks.
> >
>


On Mon, Aug 5, 2013 at 4:58 PM, Joe Stein <cr...@gmail.com> wrote:

> Yes, depending on your implementation of the event processing system you
> may simply be able to have each topic be the type of event.
>
> So you would have a topic called "TypeA" and then setup a consumer group
> and those consumers (if you really only needed 1 consumer set the
> partitions to 1) would get everything from the "TypeA" topic.  If you had
> more event types then just setup more topics and then consumers group for a
> consumer for those topics.
>
> Now depending on what you need to-do you may need topics to not be by type
> perhaps be another value and in that case you can still "pin" data to a
> consumer in that case use Semantic Partitioning
> http://kafka.apache.org/design.html
> *
> *
> *Semantic partitioning*
>
> "Consider an application that would like to maintain an aggregation of the
> number of profile visitors for each member. It would like to send all
> profile visit events for a member to a particular partition and, hence,
> have all updates for a member to appear in the same stream for the same
> consumer thread. The producer has the capability to be able to semantically
> map messages to the available kafka nodes and partitions. This allows
> partitioning the stream of messages with some semantic partition function
> based on some key in the message to spread them over broker machines. The
> partitioning function can be customized by providing an implementation of
> the kafka.producer.Partitioner interface, default being the random
> partitioner. For the example above, the key would be member_id and the
> partitioning function would be hash(member_id)%num_partitions."
>
> Take a look at KeyedMessage.scala, the ConsoleProducer.scala example uses
> the overloaded constructor but you can use the constructor which makes the
> partitions random by instead passin in the key (the type) and also set
> your key-serializer to kafka.serializer.StringEncoder or make your own
>
> In this case you might need to have a partition for each topic unless you
> can have a consumer read different event types again, it all depends on
> your implementation of your event processing system.
>
> Hope this helps getting you started some, thanks!
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Mon, Aug 5, 2013 at 5:44 AM, masoom alam
> <ma...@imsciences.edu.pk>wrote:
>
> > Hi every one,
> >
> > i am new to kafka and desiging an Event Processing System. Is this
> possible
> > that Kafka  Broker can do some event dependency handling so that for
> > example events of type A only goes to Consumer1.
> >
> > I hope I was able to explain my problem
> >
> > Thanks.
> >
>

Re: WELCOME to users@kafka.apache.org

Posted by Joe Stein <cr...@gmail.com>.
Yes, depending on your implementation of the event processing system you
may simply be able to have each topic be the type of event.

So you would have a topic called "TypeA" and then setup a consumer group
and those consumers (if you really only needed 1 consumer set the
partitions to 1) would get everything from the "TypeA" topic.  If you had
more event types then just setup more topics and then consumers group for a
consumer for those topics.

Now depending on what you need to-do you may need topics to not be by type
perhaps be another value and in that case you can still "pin" data to a
consumer in that case use Semantic Partitioning
http://kafka.apache.org/design.html
*
*
*Semantic partitioning*

"Consider an application that would like to maintain an aggregation of the
number of profile visitors for each member. It would like to send all
profile visit events for a member to a particular partition and, hence,
have all updates for a member to appear in the same stream for the same
consumer thread. The producer has the capability to be able to semantically
map messages to the available kafka nodes and partitions. This allows
partitioning the stream of messages with some semantic partition function
based on some key in the message to spread them over broker machines. The
partitioning function can be customized by providing an implementation of
the kafka.producer.Partitioner interface, default being the random
partitioner. For the example above, the key would be member_id and the
partitioning function would be hash(member_id)%num_partitions."

Take a look at KeyedMessage.scala, the ConsoleProducer.scala example uses
the overloaded constructor but you can use the constructor which makes the
partitions random by instead passin in the key (the type) and also set
your key-serializer to kafka.serializer.StringEncoder or make your own

In this case you might need to have a partition for each topic unless you
can have a consumer read different event types again, it all depends on
your implementation of your event processing system.

Hope this helps getting you started some, thanks!

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Mon, Aug 5, 2013 at 5:44 AM, masoom alam
<ma...@imsciences.edu.pk>wrote:

> Hi every one,
>
> i am new to kafka and desiging an Event Processing System. Is this possible
> that Kafka  Broker can do some event dependency handling so that for
> example events of type A only goes to Consumer1.
>
> I hope I was able to explain my problem
>
> Thanks.
>