You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Mark <st...@gmail.com> on 2011/11/04 17:27:48 UTC

Event processing use case/examples

I am struggling on some core design concepts and I was hoping someone 
could explaining how they use Kafka in their production event for event 
processing. For example, I've read that LinkedIn has over 60+ metrics 
they collect and aggregate.. ie page views, clicks etc. I clearly grasp 
the concept of logging  a page view event to Kafka, but I'm missing the 
last part. How does one go about aggregating this data and using it any 
other way than a simple data sink.

Taking the "page_view" example further. What is the preferred way of 
logging and consuming this event?  Would you have a consumer that just 
consumes page views? If so, how do you go about making sure you dont 
reconsume the same message in the event of a conusmer restart? Also for 
analytical/reporting needs how do you deal with timeframes? Say my 
consumer is subscribe to the "page_view" topic and I want all messages 
from 8am-9am. Would I read all messages and filter out any that doesn't 
have a specific timestamp, or would I create very a seperate topic for 
each hour.. ie "page_view/08:00".  Same question applies to importing 
all "page_views" for yesterday into Hadoop.

I know Kafka is a new project and im sure everyones time is constrained 
but I think it would be helpful if some high level examples/use cases 
and best practices were added to the wiki. This could help gain adoption 
and hopeful bring in a more willing contributors :)

Thanks for your help




Re: Event processing use case/examples

Posted by Jun Rao <ju...@gmail.com>.
Mark,

That's a good suggestion. I created a page off Kafka wiki (
https://cwiki.apache.org/confluence/display/KAFKA/Index) and we can use it
to document all production use cases of Kafka. We will be adding LinkedIn
usage there soon.

Jun

On Fri, Nov 4, 2011 at 9:27 AM, Mark <st...@gmail.com> wrote:

> I am struggling on some core design concepts and I was hoping someone
> could explaining how they use Kafka in their production event for event
> processing. For example, I've read that LinkedIn has over 60+ metrics they
> collect and aggregate.. ie page views, clicks etc. I clearly grasp the
> concept of logging  a page view event to Kafka, but I'm missing the last
> part. How does one go about aggregating this data and using it any other
> way than a simple data sink.
>
> Taking the "page_view" example further. What is the preferred way of
> logging and consuming this event?  Would you have a consumer that just
> consumes page views? If so, how do you go about making sure you dont
> reconsume the same message in the event of a conusmer restart? Also for
> analytical/reporting needs how do you deal with timeframes? Say my consumer
> is subscribe to the "page_view" topic and I want all messages from 8am-9am.
> Would I read all messages and filter out any that doesn't have a specific
> timestamp, or would I create very a seperate topic for each hour.. ie
> "page_view/08:00".  Same question applies to importing all "page_views" for
> yesterday into Hadoop.
>
> I know Kafka is a new project and im sure everyones time is constrained
> but I think it would be helpful if some high level examples/use cases and
> best practices were added to the wiki. This could help gain adoption and
> hopeful bring in a more willing contributors :)
>
> Thanks for your help
>
>
>
>