You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ron Tsoref <ch...@gmail.com> on 2013/01/10 18:59:52 UTC

Access logs aggregation

Hi.

I currently have a couple of servers running a reverse proxy software that
creates access logs in the squid log format (Here is a
screenshot<http://www.arnut.com/pics/itpro/LogFile22.jpg>showing the
file's formation).

As I understand it, Kafka is a good solution for handling the connection
between the proxies (that actually create the logs) and Storm ( in order to
analyze them in real-time).

Right now, I'm looking for a way to gather the logs from each server with
Kafka and how to configure this Kafka instances.

I would appreciate any recommendation on how to do this, or any other
source regarding this kind of setup.

Is there any production-ready producer that can handle this log aggregating
task? Basically, the ideal solution for me would be to generate a message
for each one of the lines in the logs for each server, and then analyzing
with Storm shouldn't be a big problem.

Thanks,

Ron

Re: Access logs aggregation

Posted by Neha Narkhede <ne...@gmail.com>.
To ensure if messages are properly deserialized, you can look into writing
your own "formatter" for the ConsoleConsumer. Similarly, you might have to
wire in your own "line-reader" for the ConsoleProducer.

Thanks,
Neha


On Sat, Jan 12, 2013 at 11:12 AM, Ron Tsoref <ch...@gmail.com> wrote:

> I have been experimenting with Kafka for the last hour or so and it seems
> like using a custom *tail* command with the producer is sending the written
> lines of logs to Kafka.
>
> However, there's no a clear separation between the messages that
> are received by the ConsoleConsumer, so I can't be sure that lines are sent
> fully or cut down in the middle (Even if a message contains 10 lines of
> logs it should be fine, because they will be going through some processing
> later. I just need to make sure that lines don't split up into 2 messages).
>  I will be testing this sometime next week.
>
> Is there any simple consumer available that show messages separated?
>
> Thanks a lot!
>
> Ron
>
>
> On Fri, Jan 11, 2013 at 7:18 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Ron,
> >
> > The best way of doing this would be to use the ConsoleProducer.
> Basically,
> > it reads data from the console and parses it using the message "reader"
> > which by default is the LineReader. In this case, you can either write
> your
> > own SquidMessageReader that understands the Squid access format [1] and
> > sends JSON data to Kafka or use the inbuilt LineReader al though that
> > wouldn't attach any structure to your data.
> >
> > If you do end up writing a squid message reader, would you mind putting
> it
> > up on github. I can see it being useful to the community -
> > https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
> >
> > Thanks,
> > Neha
> >
> > 1. http://wiki.squid-cache.org/Features/LogFormat
> >
> >
> > On Thu, Jan 10, 2013 at 10:24 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > The following wiki describes the operational part of Kafka.
> > > https://cwiki.apache.org/confluence/display/KAFKA/Operations
> > >
> > > To get your log into Kafka, if this log4j data, you may consider
> adding a
> > > KafkaLog4jAppender. Otherwise, you can probably use ConsoleProducer.
> You
> > > will still need to deal with things like log rolling yourself though.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Jan 10, 2013 at 9:59 AM, Ron Tsoref <ch...@gmail.com> wrote:
> > >
> > > > Hi.
> > > >
> > > > I currently have a couple of servers running a reverse proxy software
> > > that
> > > > creates access logs in the squid log format (Here is a
> > > > screenshot<http://www.arnut.com/pics/itpro/LogFile22.jpg>showing the
> > > > file's formation).
> > > >
> > > > As I understand it, Kafka is a good solution for handling the
> > connection
> > > > between the proxies (that actually create the logs) and Storm ( in
> > order
> > > to
> > > > analyze them in real-time).
> > > >
> > > > Right now, I'm looking for a way to gather the logs from each server
> > with
> > > > Kafka and how to configure this Kafka instances.
> > > >
> > > > I would appreciate any recommendation on how to do this, or any other
> > > > source regarding this kind of setup.
> > > >
> > > > Is there any production-ready producer that can handle this log
> > > aggregating
> > > > task? Basically, the ideal solution for me would be to generate a
> > message
> > > > for each one of the lines in the logs for each server, and then
> > analyzing
> > > > with Storm shouldn't be a big problem.
> > > >
> > > > Thanks,
> > > >
> > > > Ron
> > > >
> > >
> >
>

Re: Access logs aggregation

Posted by Ron Tsoref <ch...@gmail.com>.
I have been experimenting with Kafka for the last hour or so and it seems
like using a custom *tail* command with the producer is sending the written
lines of logs to Kafka.

However, there's no a clear separation between the messages that
are received by the ConsoleConsumer, so I can't be sure that lines are sent
fully or cut down in the middle (Even if a message contains 10 lines of
logs it should be fine, because they will be going through some processing
later. I just need to make sure that lines don't split up into 2 messages).
 I will be testing this sometime next week.

Is there any simple consumer available that show messages separated?

Thanks a lot!

Ron


On Fri, Jan 11, 2013 at 7:18 AM, Neha Narkhede <ne...@gmail.com>wrote:

> Ron,
>
> The best way of doing this would be to use the ConsoleProducer. Basically,
> it reads data from the console and parses it using the message "reader"
> which by default is the LineReader. In this case, you can either write your
> own SquidMessageReader that understands the Squid access format [1] and
> sends JSON data to Kafka or use the inbuilt LineReader al though that
> wouldn't attach any structure to your data.
>
> If you do end up writing a squid message reader, would you mind putting it
> up on github. I can see it being useful to the community -
> https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
>
> Thanks,
> Neha
>
> 1. http://wiki.squid-cache.org/Features/LogFormat
>
>
> On Thu, Jan 10, 2013 at 10:24 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > The following wiki describes the operational part of Kafka.
> > https://cwiki.apache.org/confluence/display/KAFKA/Operations
> >
> > To get your log into Kafka, if this log4j data, you may consider adding a
> > KafkaLog4jAppender. Otherwise, you can probably use ConsoleProducer. You
> > will still need to deal with things like log rolling yourself though.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Jan 10, 2013 at 9:59 AM, Ron Tsoref <ch...@gmail.com> wrote:
> >
> > > Hi.
> > >
> > > I currently have a couple of servers running a reverse proxy software
> > that
> > > creates access logs in the squid log format (Here is a
> > > screenshot<http://www.arnut.com/pics/itpro/LogFile22.jpg>showing the
> > > file's formation).
> > >
> > > As I understand it, Kafka is a good solution for handling the
> connection
> > > between the proxies (that actually create the logs) and Storm ( in
> order
> > to
> > > analyze them in real-time).
> > >
> > > Right now, I'm looking for a way to gather the logs from each server
> with
> > > Kafka and how to configure this Kafka instances.
> > >
> > > I would appreciate any recommendation on how to do this, or any other
> > > source regarding this kind of setup.
> > >
> > > Is there any production-ready producer that can handle this log
> > aggregating
> > > task? Basically, the ideal solution for me would be to generate a
> message
> > > for each one of the lines in the logs for each server, and then
> analyzing
> > > with Storm shouldn't be a big problem.
> > >
> > > Thanks,
> > >
> > > Ron
> > >
> >
>

Re: Access logs aggregation

Posted by Neha Narkhede <ne...@gmail.com>.
Ron,

The best way of doing this would be to use the ConsoleProducer. Basically,
it reads data from the console and parses it using the message "reader"
which by default is the LineReader. In this case, you can either write your
own SquidMessageReader that understands the Squid access format [1] and
sends JSON data to Kafka or use the inbuilt LineReader al though that
wouldn't attach any structure to your data.

If you do end up writing a squid message reader, would you mind putting it
up on github. I can see it being useful to the community -
https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Thanks,
Neha

1. http://wiki.squid-cache.org/Features/LogFormat


On Thu, Jan 10, 2013 at 10:24 AM, Jun Rao <ju...@gmail.com> wrote:

> The following wiki describes the operational part of Kafka.
> https://cwiki.apache.org/confluence/display/KAFKA/Operations
>
> To get your log into Kafka, if this log4j data, you may consider adding a
> KafkaLog4jAppender. Otherwise, you can probably use ConsoleProducer. You
> will still need to deal with things like log rolling yourself though.
>
> Thanks,
>
> Jun
>
> On Thu, Jan 10, 2013 at 9:59 AM, Ron Tsoref <ch...@gmail.com> wrote:
>
> > Hi.
> >
> > I currently have a couple of servers running a reverse proxy software
> that
> > creates access logs in the squid log format (Here is a
> > screenshot<http://www.arnut.com/pics/itpro/LogFile22.jpg>showing the
> > file's formation).
> >
> > As I understand it, Kafka is a good solution for handling the connection
> > between the proxies (that actually create the logs) and Storm ( in order
> to
> > analyze them in real-time).
> >
> > Right now, I'm looking for a way to gather the logs from each server with
> > Kafka and how to configure this Kafka instances.
> >
> > I would appreciate any recommendation on how to do this, or any other
> > source regarding this kind of setup.
> >
> > Is there any production-ready producer that can handle this log
> aggregating
> > task? Basically, the ideal solution for me would be to generate a message
> > for each one of the lines in the logs for each server, and then analyzing
> > with Storm shouldn't be a big problem.
> >
> > Thanks,
> >
> > Ron
> >
>

Re: Access logs aggregation

Posted by Jun Rao <ju...@gmail.com>.
The following wiki describes the operational part of Kafka.
https://cwiki.apache.org/confluence/display/KAFKA/Operations

To get your log into Kafka, if this log4j data, you may consider adding a
KafkaLog4jAppender. Otherwise, you can probably use ConsoleProducer. You
will still need to deal with things like log rolling yourself though.

Thanks,

Jun

On Thu, Jan 10, 2013 at 9:59 AM, Ron Tsoref <ch...@gmail.com> wrote:

> Hi.
>
> I currently have a couple of servers running a reverse proxy software that
> creates access logs in the squid log format (Here is a
> screenshot<http://www.arnut.com/pics/itpro/LogFile22.jpg>showing the
> file's formation).
>
> As I understand it, Kafka is a good solution for handling the connection
> between the proxies (that actually create the logs) and Storm ( in order to
> analyze them in real-time).
>
> Right now, I'm looking for a way to gather the logs from each server with
> Kafka and how to configure this Kafka instances.
>
> I would appreciate any recommendation on how to do this, or any other
> source regarding this kind of setup.
>
> Is there any production-ready producer that can handle this log aggregating
> task? Basically, the ideal solution for me would be to generate a message
> for each one of the lines in the logs for each server, and then analyzing
> with Storm shouldn't be a big problem.
>
> Thanks,
>
> Ron
>