You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bhavesh Mistry <mi...@gmail.com> on 2015/01/05 20:43:02 UTC

Latency Tracking Across All Kafka Component

Hi Kafka Team/Users,

We are using Linked-in Kafka data pipe-line end-to-end.

Producer(s) ->Local DC Brokers -> MM -> Central brokers -> Camus Job ->
HDFS

This is working out very well for us, but we need to have visibility of
latency at each layer (Local DC Brokers -> MM -> Central brokers -> Camus
Job ->  HDFS).  Our events are time-based (time event was produce).  Is
there any feature or any audit trail  mentioned at (
https://github.com/linkedin/camus/) ?  But, I would like to know in-between
latency and time event spent in each hope? So, we do not know where is
problem and what t o optimize ?

Any of this cover in 0.9.0 or any other version of upcoming Kafka release
?  How might we achive this  latency tracking across all components ?


Thanks,

Bhavesh

Re: Latency Tracking Across All Kafka Component

Posted by Bhavesh Mistry <mi...@gmail.com>.
HI Kafka Team,

 I completely understand the use of the Audit event and reference material
posted here https://issues.apache.org/jira/browse/KAFKA-260 and slides .

Since we are enterprise customer of the Kafka end-to-end pipeline, it would
be great if Kafka have build-in support for distributive  tracing.  Here is
how I envision  Kakfa Distributive Tracing:

1) Would like to see the end to end journey for each major hops (including
major component within same JVM (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Internals eg
network layer, API layer, Replication, and Log Sub System etc).

Once an app team produces audit log message, it will contain GUID and
ability to trace its journey through producer (queue) –network-to-broker
(request channel, to API layer, disk commit to consumer read etc).  This
gives both Kafka Customers (Operations) and Developers ability to trace
event journey and zoom into component which is bottle neck.  Of course, the
use case can be expended to have aggregated call graph for entire pipeline.
(far fetch vision).


Here are couple of reference were other company are using for tracing
distributive system.

http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36356.pdf
https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin

eBay  Transactional Logger (Distributed Tree Logging)
http://server.dzone.com/articles/monitoring-ebay-big-data
https://devopsdotcom.files.wordpress.com/2012/11/screen-shot-2012-11-11-at-10-06-39-am.png


UI for tracking the Audit event:
http://4.bp.blogspot.com/-b0r71ZbJdmA/T9DYhbE0uXI/AAAAAAAAABs/bXwyM76Iddc/s1600/web-screenshot.png

This is how I would implement:
Each of Kafka component logs its Transactional log for audit event into
disk  → Agent (Flume, Logstash etc) sends those pre-formatted(Structure)
 logs to → Elastic Search so people can search by the GUID and  produce
call graph similar to Zipkin or Chrome resource TimeLine View of Event
where it spent time etc.

This would be powerful tool for both Kafka Development team for customers
who have latency issues.  This requires lots of effort and code
instrumentation.  It would be cool if Kafka team at least gets started with
distributive tracing functionality.

I am sorry I got back to you so late.

Thanks,

Bhavesh


On Thu, Jan 15, 2015 at 4:01 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hi,
>
> At LinkedIn we used an audit module to track the latency / message counts
> at each "tier" of the pipeline (for your example it will have the producer
> / local / central / HDFS tiers). Some details can be found on our recent
> talk slides (slide 41/42):
>
> http://www.slideshare.net/GuozhangWang/apache-kafka-at-linkedin-43307044
>
> This audit is specific to the usage of Avro as our serialization tool
> though, and we are considering ways to get it generalized hence
> open-sourced.
>
> Guozhang
>
>
> On Mon, Jan 5, 2015 at 3:33 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com
> > wrote:
>
> > Hi,
> >
> > That sounds a bit like needing a full, cross-app, cross-network
> > transaction/call tracing, and not something specific or limited to Kafka,
> > doesn't it?
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Mon, Jan 5, 2015 at 2:43 PM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com
> > >
> > wrote:
> >
> > > Hi Kafka Team/Users,
> > >
> > > We are using Linked-in Kafka data pipe-line end-to-end.
> > >
> > > Producer(s) ->Local DC Brokers -> MM -> Central brokers -> Camus Job ->
> > > HDFS
> > >
> > > This is working out very well for us, but we need to have visibility of
> > > latency at each layer (Local DC Brokers -> MM -> Central brokers ->
> Camus
> > > Job ->  HDFS).  Our events are time-based (time event was produce).  Is
> > > there any feature or any audit trail  mentioned at (
> > > https://github.com/linkedin/camus/) ?  But, I would like to know
> > > in-between
> > > latency and time event spent in each hope? So, we do not know where is
> > > problem and what t o optimize ?
> > >
> > > Any of this cover in 0.9.0 or any other version of upcoming Kafka
> release
> > > ?  How might we achive this  latency tracking across all components ?
> > >
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: Latency Tracking Across All Kafka Component

Posted by Guozhang Wang <wa...@gmail.com>.
Hi,

At LinkedIn we used an audit module to track the latency / message counts
at each "tier" of the pipeline (for your example it will have the producer
/ local / central / HDFS tiers). Some details can be found on our recent
talk slides (slide 41/42):

http://www.slideshare.net/GuozhangWang/apache-kafka-at-linkedin-43307044

This audit is specific to the usage of Avro as our serialization tool
though, and we are considering ways to get it generalized hence
open-sourced.

Guozhang


On Mon, Jan 5, 2015 at 3:33 PM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> That sounds a bit like needing a full, cross-app, cross-network
> transaction/call tracing, and not something specific or limited to Kafka,
> doesn't it?
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Jan 5, 2015 at 2:43 PM, Bhavesh Mistry <mistry.p.bhavesh@gmail.com
> >
> wrote:
>
> > Hi Kafka Team/Users,
> >
> > We are using Linked-in Kafka data pipe-line end-to-end.
> >
> > Producer(s) ->Local DC Brokers -> MM -> Central brokers -> Camus Job ->
> > HDFS
> >
> > This is working out very well for us, but we need to have visibility of
> > latency at each layer (Local DC Brokers -> MM -> Central brokers -> Camus
> > Job ->  HDFS).  Our events are time-based (time event was produce).  Is
> > there any feature or any audit trail  mentioned at (
> > https://github.com/linkedin/camus/) ?  But, I would like to know
> > in-between
> > latency and time event spent in each hope? So, we do not know where is
> > problem and what t o optimize ?
> >
> > Any of this cover in 0.9.0 or any other version of upcoming Kafka release
> > ?  How might we achive this  latency tracking across all components ?
> >
> >
> > Thanks,
> >
> > Bhavesh
> >
>



-- 
-- Guozhang

Re: Latency Tracking Across All Kafka Component

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

That sounds a bit like needing a full, cross-app, cross-network
transaction/call tracing, and not something specific or limited to Kafka,
doesn't it?

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 5, 2015 at 2:43 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Kafka Team/Users,
>
> We are using Linked-in Kafka data pipe-line end-to-end.
>
> Producer(s) ->Local DC Brokers -> MM -> Central brokers -> Camus Job ->
> HDFS
>
> This is working out very well for us, but we need to have visibility of
> latency at each layer (Local DC Brokers -> MM -> Central brokers -> Camus
> Job ->  HDFS).  Our events are time-based (time event was produce).  Is
> there any feature or any audit trail  mentioned at (
> https://github.com/linkedin/camus/) ?  But, I would like to know
> in-between
> latency and time event spent in each hope? So, we do not know where is
> problem and what t o optimize ?
>
> Any of this cover in 0.9.0 or any other version of upcoming Kafka release
> ?  How might we achive this  latency tracking across all components ?
>
>
> Thanks,
>
> Bhavesh
>

Re: Latency Tracking Across All Kafka Component

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

That sounds a bit like needing a full, cross-app, cross-network
transaction/call tracing, and not something specific or limited to Kafka,
doesn't it?

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 5, 2015 at 2:43 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Kafka Team/Users,
>
> We are using Linked-in Kafka data pipe-line end-to-end.
>
> Producer(s) ->Local DC Brokers -> MM -> Central brokers -> Camus Job ->
> HDFS
>
> This is working out very well for us, but we need to have visibility of
> latency at each layer (Local DC Brokers -> MM -> Central brokers -> Camus
> Job ->  HDFS).  Our events are time-based (time event was produce).  Is
> there any feature or any audit trail  mentioned at (
> https://github.com/linkedin/camus/) ?  But, I would like to know
> in-between
> latency and time event spent in each hope? So, we do not know where is
> problem and what t o optimize ?
>
> Any of this cover in 0.9.0 or any other version of upcoming Kafka release
> ?  How might we achive this  latency tracking across all components ?
>
>
> Thanks,
>
> Bhavesh
>