You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Fernando O." <fo...@gmail.com> on 2015/01/28 19:39:41 UTC

Resilient Producer

Hi all,
    I'm evaluating using Kafka.

I liked this thing of Facebook scribe that you log to your own machine and
then there's a separate process that forwards messages to the central
logger.

With Kafka it seems that I have to embed the publisher in my app, and deal
with any communication problem managing that on the producer side.

I googled quite a bit trying to find a project that would basically use
daemon that parses a log file and send the lines to the Kafka cluster
(something like a tail file.log but instead of redirecting the output to
the console: send it to kafka)

Does anyone knows about something like that?


Thanks!
Fernando.

Re: Resilient Producer

Posted by Colin <co...@clark.ws>.
Logstash

--
Colin Clark 
+1 612 859 6129
Skype colin.p.clark

> On Jan 28, 2015, at 10:47 AM, Gwen Shapira <gs...@cloudera.com> wrote:
> 
> It sounds like you are describing Flume, with SpoolingDirectory source
> (or exec source running tail) and Kafka channel.
> 
>> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:
>> Hi all,
>>    I'm evaluating using Kafka.
>> 
>> I liked this thing of Facebook scribe that you log to your own machine and
>> then there's a separate process that forwards messages to the central
>> logger.
>> 
>> With Kafka it seems that I have to embed the publisher in my app, and deal
>> with any communication problem managing that on the producer side.
>> 
>> I googled quite a bit trying to find a project that would basically use
>> daemon that parses a log file and send the lines to the Kafka cluster
>> (something like a tail file.log but instead of redirecting the output to
>> the console: send it to kafka)
>> 
>> Does anyone knows about something like that?
>> 
>> 
>> Thanks!
>> Fernando.

Re: Resilient Producer

Posted by Magnus Edenhill <ma...@edenhill.se>.
The big syslog daemons support Kafka since a while back.

rsyslog:
http://www.rsyslog.com/doc/master/configuration/modules/omkafka.html

syslog-ng:
https://czanik.blogs.balabit.com/2015/01/syslog-ng-kafka-destination-support/#more-1013

And Bruce might be of interest aswell:
https://github.com/tagged/bruce


On the less daemony and more tooly side of things are:

https://github.com/fsaintjacques/tail-kafka
https://github.com/mguindin/tail-kafka
https://github.com/edenhill/kafkacat


2015-01-28 19:47 GMT+01:00 Gwen Shapira <gs...@cloudera.com>:

> It sounds like you are describing Flume, with SpoolingDirectory source
> (or exec source running tail) and Kafka channel.
>
> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:
> > Hi all,
> >     I'm evaluating using Kafka.
> >
> > I liked this thing of Facebook scribe that you log to your own machine
> and
> > then there's a separate process that forwards messages to the central
> > logger.
> >
> > With Kafka it seems that I have to embed the publisher in my app, and
> deal
> > with any communication problem managing that on the producer side.
> >
> > I googled quite a bit trying to find a project that would basically use
> > daemon that parses a log file and send the lines to the Kafka cluster
> > (something like a tail file.log but instead of redirecting the output to
> > the console: send it to kafka)
> >
> > Does anyone knows about something like that?
> >
> >
> > Thanks!
> > Fernando.
>

Re: Resilient Producer

Posted by Gwen Shapira <gs...@cloudera.com>.
It sounds like you are describing Flume, with SpoolingDirectory source
(or exec source running tail) and Kafka channel.

On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:
> Hi all,
>     I'm evaluating using Kafka.
>
> I liked this thing of Facebook scribe that you log to your own machine and
> then there's a separate process that forwards messages to the central
> logger.
>
> With Kafka it seems that I have to embed the publisher in my app, and deal
> with any communication problem managing that on the producer side.
>
> I googled quite a bit trying to find a project that would basically use
> daemon that parses a log file and send the lines to the Kafka cluster
> (something like a tail file.log but instead of redirecting the output to
> the console: send it to kafka)
>
> Does anyone knows about something like that?
>
>
> Thanks!
> Fernando.

Re: Resilient Producer

Posted by Lakshmanan Muthuraman <la...@tokbox.com>.
Thanks David. This looks to be interesting. Will definitely test this out
to see whether this solves our problem.

On Thu, Jan 29, 2015 at 8:29 AM, David Morales <dm...@stratio.com> wrote:

> Existing "tail" source is not the best choice in your scenario, as you have
> pointed out.
>
> SpoolDir could be a solution if your log file rotation policy is very low
> (5 minutes, for example), but then you have to deal with a huge number of
> files in the folder (slower listings).
>
> There is a proposal for a new approach, something that combines the best of
> "tail" and "spoolDir". Take a look here:
>
> https://issues.apache.org/jira/browse/FLUME-2498
>
>
>
>
> 2015-01-29 0:24 GMT+01:00 Lakshmanan Muthuraman <la...@tokbox.com>:
>
> > We have been using Flume to solve a very similar usecase. Our servers
> write
> > the log files to a local file system, and then we have flume agent which
> > ships the data to kafka.
> >
> > Flume you can use as exec source running tail. Though the exec source
> runs
> > well with tail, there are issues if the agent goes down or the file
> channel
> > starts building up. If the agent goes down, you can request flume exec
> tail
> > source to go back n number of lines or read from beginning of the file.
> The
> > challenge is we roll our log files on a daily basis. What if goes down in
> > the evening. We need to go back to the entire days worth of data for
> > reprocessing which slows down the data flow. We can also go back
> arbitarily
> > number of lines, but then we dont know what is the right number to go
> back.
> > This is kind of challenge for us. We have tried spooling directory. Which
> > works, but we need to have a different log file rotation policy. We
> > considered evening going a file rotation for a minute, but it will  still
> > affect the real time data flow in our kafka--->storm-->Elastic search
> > pipeline with a minute delay.
> >
> > We are going to do a poc on logstash to see how this solves the problem
> of
> > flume.
> >
> > On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:
> >
> > > Hi all,
> > >     I'm evaluating using Kafka.
> > >
> > > I liked this thing of Facebook scribe that you log to your own machine
> > and
> > > then there's a separate process that forwards messages to the central
> > > logger.
> > >
> > > With Kafka it seems that I have to embed the publisher in my app, and
> > deal
> > > with any communication problem managing that on the producer side.
> > >
> > > I googled quite a bit trying to find a project that would basically use
> > > daemon that parses a log file and send the lines to the Kafka cluster
> > > (something like a tail file.log but instead of redirecting the output
> to
> > > the console: send it to kafka)
> > >
> > > Does anyone knows about something like that?
> > >
> > >
> > > Thanks!
> > > Fernando.
> > >
> >
>
>
>
> --
>
> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
> <https://twitter.com/dmoralesdf>
>
>
> <http://www.stratio.com/>
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>

Re: Resilient Producer

Posted by David Morales <dm...@stratio.com>.
Existing "tail" source is not the best choice in your scenario, as you have
pointed out.

SpoolDir could be a solution if your log file rotation policy is very low
(5 minutes, for example), but then you have to deal with a huge number of
files in the folder (slower listings).

There is a proposal for a new approach, something that combines the best of
"tail" and "spoolDir". Take a look here:

https://issues.apache.org/jira/browse/FLUME-2498




2015-01-29 0:24 GMT+01:00 Lakshmanan Muthuraman <la...@tokbox.com>:

> We have been using Flume to solve a very similar usecase. Our servers write
> the log files to a local file system, and then we have flume agent which
> ships the data to kafka.
>
> Flume you can use as exec source running tail. Though the exec source runs
> well with tail, there are issues if the agent goes down or the file channel
> starts building up. If the agent goes down, you can request flume exec tail
> source to go back n number of lines or read from beginning of the file. The
> challenge is we roll our log files on a daily basis. What if goes down in
> the evening. We need to go back to the entire days worth of data for
> reprocessing which slows down the data flow. We can also go back arbitarily
> number of lines, but then we dont know what is the right number to go back.
> This is kind of challenge for us. We have tried spooling directory. Which
> works, but we need to have a different log file rotation policy. We
> considered evening going a file rotation for a minute, but it will  still
> affect the real time data flow in our kafka--->storm-->Elastic search
> pipeline with a minute delay.
>
> We are going to do a poc on logstash to see how this solves the problem of
> flume.
>
> On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:
>
> > Hi all,
> >     I'm evaluating using Kafka.
> >
> > I liked this thing of Facebook scribe that you log to your own machine
> and
> > then there's a separate process that forwards messages to the central
> > logger.
> >
> > With Kafka it seems that I have to embed the publisher in my app, and
> deal
> > with any communication problem managing that on the producer side.
> >
> > I googled quite a bit trying to find a project that would basically use
> > daemon that parses a log file and send the lines to the Kafka cluster
> > (something like a tail file.log but instead of redirecting the output to
> > the console: send it to kafka)
> >
> > Does anyone knows about something like that?
> >
> >
> > Thanks!
> > Fernando.
> >
>



-- 

David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
<https://twitter.com/dmoralesdf>


<http://www.stratio.com/>
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

Re: Resilient Producer

Posted by Lakshmanan Muthuraman <la...@tokbox.com>.
We have been using Flume to solve a very similar usecase. Our servers write
the log files to a local file system, and then we have flume agent which
ships the data to kafka.

Flume you can use as exec source running tail. Though the exec source runs
well with tail, there are issues if the agent goes down or the file channel
starts building up. If the agent goes down, you can request flume exec tail
source to go back n number of lines or read from beginning of the file. The
challenge is we roll our log files on a daily basis. What if goes down in
the evening. We need to go back to the entire days worth of data for
reprocessing which slows down the data flow. We can also go back arbitarily
number of lines, but then we dont know what is the right number to go back.
This is kind of challenge for us. We have tried spooling directory. Which
works, but we need to have a different log file rotation policy. We
considered evening going a file rotation for a minute, but it will  still
affect the real time data flow in our kafka--->storm-->Elastic search
pipeline with a minute delay.

We are going to do a poc on logstash to see how this solves the problem of
flume.

On Wed, Jan 28, 2015 at 10:39 AM, Fernando O. <fo...@gmail.com> wrote:

> Hi all,
>     I'm evaluating using Kafka.
>
> I liked this thing of Facebook scribe that you log to your own machine and
> then there's a separate process that forwards messages to the central
> logger.
>
> With Kafka it seems that I have to embed the publisher in my app, and deal
> with any communication problem managing that on the producer side.
>
> I googled quite a bit trying to find a project that would basically use
> daemon that parses a log file and send the lines to the Kafka cluster
> (something like a tail file.log but instead of redirecting the output to
> the console: send it to kafka)
>
> Does anyone knows about something like that?
>
>
> Thanks!
> Fernando.
>

Re: Resilient Producer

Posted by Otis Gospodnetic <ot...@gmail.com>.
Fernando, have a look -
http://blog.sematext.com/2014/10/06/top-5-most-popular-log-shippers/

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Jan 28, 2015 at 1:39 PM, Fernando O. <fo...@gmail.com> wrote:

> Hi all,
>     I'm evaluating using Kafka.
>
> I liked this thing of Facebook scribe that you log to your own machine and
> then there's a separate process that forwards messages to the central
> logger.
>
> With Kafka it seems that I have to embed the publisher in my app, and deal
> with any communication problem managing that on the producer side.
>
> I googled quite a bit trying to find a project that would basically use
> daemon that parses a log file and send the lines to the Kafka cluster
> (something like a tail file.log but instead of redirecting the output to
> the console: send it to kafka)
>
> Does anyone knows about something like that?
>
>
> Thanks!
> Fernando.
>

Re: Resilient Producer

Posted by "Fernando O." <fo...@gmail.com>.
Something like Heka but lightweight :D

On Wed, Jan 28, 2015 at 3:39 PM, Fernando O. <fo...@gmail.com> wrote:

> Hi all,
>     I'm evaluating using Kafka.
>
> I liked this thing of Facebook scribe that you log to your own machine and
> then there's a separate process that forwards messages to the central
> logger.
>
> With Kafka it seems that I have to embed the publisher in my app, and deal
> with any communication problem managing that on the producer side.
>
> I googled quite a bit trying to find a project that would basically use
> daemon that parses a log file and send the lines to the Kafka cluster
> (something like a tail file.log but instead of redirecting the output to
> the console: send it to kafka)
>
> Does anyone knows about something like that?
>
>
> Thanks!
> Fernando.
>