You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dave Peterson <ds...@tagged.com> on 2014/09/09 21:24:51 UTC

Introducing Bruce, a Kafka producer daemon

Hello Kafka users,

Meet Bruce, a producer daemon developed at Tagged, Inc.
(http://www.tagged.com).  We are open sourcing Bruce because we have
found it useful at Tagged, and believe others may also benefit from it.
Bruce is available on GitHub (https://github.com/tagged/bruce ).

We developed Bruce to function as a single intake point for a Kafka
cluster that serves diverse clients written in a variety of programming
languages.  Clients write messages to Bruce's UNIX domain datagram
socket using a simple binary format.  Once a client writes a message,
Bruce takes full responsibility for reliable delivery to the Kafka
cluster.  Communication between Bruce and clients is purely one-way.
After writing a message to Bruce's socket, there is no need for a client
to wait for an acknowledgement.  The operating system provides the same
reliability guarantee for UNIX domain sockets as for other local
interprocess communication mechanisms such as traditional UNIX pipes.
Example client code for writing messages to Bruce's socket is currently
available in C, C++, Java, Python, and PHP.  Community contributions for
other programming languages are welcome.

In addition to providing a simple uniform access point for clients,
Bruce has a web-based status monitoring and data quality reporting
interface.  Bruce deals with transient load spikes and Kafka-related
problems by buffering messages in memory up to a configurable limit,
until they are sent and successfully acknowledged by a Kafka broker.  If
serious enough problems occur that Bruce is forced to discard messages,
it tracks all discards and reports them through its web interface,
giving a breakdown of discards by topic, including counts of discarded
messages and windows of time in which they occurred.  Per-topic
information on messages queued to be sent or waiting for
acknowledgements from Kafka is also available through Bruce's web
interface.

Bruce comes with Nagios-based health monitoring and discard reporting
scripts, which are currently in use at Tagged to alert us if problems
occur. The discard monitoring script stores Bruce's discard reports in
an Oracle database so we have a complete, queryable history of data
quality information.  Bruce's web interface provides easy to parse JSON
output to facilitate integration with other monitoring infrastructure.

Bruce provides batching and compression that is configurable on a per-
topic basis.  Only Snappy compression is currently supported, but Bruce
was designed to support multiple compression types.

For more information, see Bruce's documentation which is available on
its GitHub site.


Cheers,

Dave Peterson
Tagged, Inc.

Re: Introducing Bruce, a Kafka producer daemon

Posted by Cory Watson <gp...@keen.io>.
This is very cool, Dave. Thanks to you and your team for this work. This
may save us some work in the future.

Special thanks for also paying such attention to monitoring!

On Tue, Sep 9, 2014 at 12:24 PM, Dave Peterson <ds...@tagged.com>
wrote:

> Hello Kafka users,
>
> Meet Bruce, a producer daemon developed at Tagged, Inc.
> (http://www.tagged.com).  We are open sourcing Bruce because we have
> found it useful at Tagged, and believe others may also benefit from it.
> Bruce is available on GitHub (https://github.com/tagged/bruce ).
>
> We developed Bruce to function as a single intake point for a Kafka
> cluster that serves diverse clients written in a variety of programming
> languages.  Clients write messages to Bruce's UNIX domain datagram
> socket using a simple binary format.  Once a client writes a message,
> Bruce takes full responsibility for reliable delivery to the Kafka
> cluster.  Communication between Bruce and clients is purely one-way.
> After writing a message to Bruce's socket, there is no need for a client
> to wait for an acknowledgement.  The operating system provides the same
> reliability guarantee for UNIX domain sockets as for other local
> interprocess communication mechanisms such as traditional UNIX pipes.
> Example client code for writing messages to Bruce's socket is currently
> available in C, C++, Java, Python, and PHP.  Community contributions for
> other programming languages are welcome.
>
> In addition to providing a simple uniform access point for clients,
> Bruce has a web-based status monitoring and data quality reporting
> interface.  Bruce deals with transient load spikes and Kafka-related
> problems by buffering messages in memory up to a configurable limit,
> until they are sent and successfully acknowledged by a Kafka broker.  If
> serious enough problems occur that Bruce is forced to discard messages,
> it tracks all discards and reports them through its web interface,
> giving a breakdown of discards by topic, including counts of discarded
> messages and windows of time in which they occurred.  Per-topic
> information on messages queued to be sent or waiting for
> acknowledgements from Kafka is also available through Bruce's web
> interface.
>
> Bruce comes with Nagios-based health monitoring and discard reporting
> scripts, which are currently in use at Tagged to alert us if problems
> occur. The discard monitoring script stores Bruce's discard reports in
> an Oracle database so we have a complete, queryable history of data
> quality information.  Bruce's web interface provides easy to parse JSON
> output to facilitate integration with other monitoring infrastructure.
>
> Bruce provides batching and compression that is configurable on a per-
> topic basis.  Only Snappy compression is currently supported, but Bruce
> was designed to support multiple compression types.
>
> For more information, see Bruce's documentation which is available on
> its GitHub site.
>
>
> Cheers,
>
> Dave Peterson
> Tagged, Inc.
>



-- 
Cory Watson
Principal Infrastructure Engineer // Keen IO

Re: Introducing Bruce, a Kafka producer daemon

Posted by Joe Stein <jo...@stealth.ly>.
Very cool, can you update the Wiki please?
https://cwiki.apache.org/confluence/display/KAFKA/Clients maybe under a
section called Daemon? Or something?

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Tue, Sep 9, 2014 at 3:24 PM, Dave Peterson <ds...@tagged.com> wrote:

> Hello Kafka users,
>
> Meet Bruce, a producer daemon developed at Tagged, Inc.
> (http://www.tagged.com).  We are open sourcing Bruce because we have
> found it useful at Tagged, and believe others may also benefit from it.
> Bruce is available on GitHub (https://github.com/tagged/bruce ).
>
> We developed Bruce to function as a single intake point for a Kafka
> cluster that serves diverse clients written in a variety of programming
> languages.  Clients write messages to Bruce's UNIX domain datagram
> socket using a simple binary format.  Once a client writes a message,
> Bruce takes full responsibility for reliable delivery to the Kafka
> cluster.  Communication between Bruce and clients is purely one-way.
> After writing a message to Bruce's socket, there is no need for a client
> to wait for an acknowledgement.  The operating system provides the same
> reliability guarantee for UNIX domain sockets as for other local
> interprocess communication mechanisms such as traditional UNIX pipes.
> Example client code for writing messages to Bruce's socket is currently
> available in C, C++, Java, Python, and PHP.  Community contributions for
> other programming languages are welcome.
>
> In addition to providing a simple uniform access point for clients,
> Bruce has a web-based status monitoring and data quality reporting
> interface.  Bruce deals with transient load spikes and Kafka-related
> problems by buffering messages in memory up to a configurable limit,
> until they are sent and successfully acknowledged by a Kafka broker.  If
> serious enough problems occur that Bruce is forced to discard messages,
> it tracks all discards and reports them through its web interface,
> giving a breakdown of discards by topic, including counts of discarded
> messages and windows of time in which they occurred.  Per-topic
> information on messages queued to be sent or waiting for
> acknowledgements from Kafka is also available through Bruce's web
> interface.
>
> Bruce comes with Nagios-based health monitoring and discard reporting
> scripts, which are currently in use at Tagged to alert us if problems
> occur. The discard monitoring script stores Bruce's discard reports in
> an Oracle database so we have a complete, queryable history of data
> quality information.  Bruce's web interface provides easy to parse JSON
> output to facilitate integration with other monitoring infrastructure.
>
> Bruce provides batching and compression that is configurable on a per-
> topic basis.  Only Snappy compression is currently supported, but Bruce
> was designed to support multiple compression types.
>
> For more information, see Bruce's documentation which is available on
> its GitHub site.
>
>
> Cheers,
>
> Dave Peterson
> Tagged, Inc.
>