You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@chukwa.apache.org by Jorge Medina <ce...@gmail.com> on 2009/11/24 23:43:53 UTC

What is Chukwa for?

I took a quick read about Chukwa but I still can't figure if Chukwa will
help on solving my problem:

I have a web services application that runs in a cluster. The servers are
not configured to use sticky sessions (we don't even have sessions in the
application , everything the application does is stateless, no session
required)

Since multiple requests of the same customer are distributed by the load
balancer into the cluster, it is becoming difficult to provide support as
the cluster grows. Our support team has to look at the log files located in
multiple servers and try to figure the sequence of the calls.

I would like to be able to see the log files as a single file.

I have been thinking on several solutions

a) Use log4j SocketAppender to send my log records to a log server that
would  then be collecting all log records from all servers. Queries would be
made then to this single log file (matchind a date period or a username,
client identifier, etc)

b) Write an application to query each member of the cluster for log entries
matching certain criteria (a date period, or a username or client
identifier), then send the records to a server for ordering before
displaying them to the user

But then I found Chukwa, and from the architecture it seems to do what I am
looking for. Could anybody give me a confirmation that indeed Chuckwa
addresses my problem ? What parts would I need to write?
Does Chuwka comes with a UI ? Does it allow to query the centralized log ?

Thanks a lot

-Jorge

Re: What's Chukwa for?

Posted by Ariel Rabkin <as...@gmail.com>.

We're tied to Hadoop for storage and processing.  You can use Chukwa
to process logs from any source, not just Hadoop. At Berkeley, we use
it to process logs from a whole range of experimental software
systems.

--Ari

On Wed, Dec 2, 2009 at 3:30 PM, MJ Lai <mj...@gmail.com> wrote:
>
> Hi.
>
> It is another ``what for'' question.
>
> I went thru the chukwa web site and am still kind of confused by what is
> software really for. Can I say the major purpose is to provide 1) a generic
> distributed log processing system, or 2) this log system is only for Hadoop
> cluster? In case of 1), why do we want to make it tightly bound to Hadoop?
>
> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
> process the cluster logs, I still need to create another hadoop cluster to
> make it work?
>
> I think some practical use cases could reduce the confusions of this this
> project.
>
> Thanks.
> MJ
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: What's Chukwa for?

Posted by Eric Yang <ey...@yahoo-inc.com>.

Hi MJ,

When Chukwa Agent was introduced in the design,  it was designed to be both
configuration management and monitoring agent.  However, the current agent
implementation does not have configuration management capabilities nor limit
developer to add them.  The community is currently focused on monitoring
agent for now.

Regards,
Eric

On 12/3/09 3:42 PM, "MJ Lai" <mj...@gmail.com> wrote:

> 
> Eric, Ari.
> 
> Thanks for your responses.
> 
> I have some extended questions. I'm trying to figure out what is the
> best way to manage a production hadoop cluster, features include:
> - deployment: install hadoop from one console;
> - monitoring
> - configure
> - log/system analytics
> - software upgrade
> - etc.
> 
> It seems Ganglia/Nagio are widely used to monitor hadoop cluster
> metrics, and Chukwa is for log analytics. But still, it is a pain to
> manage/configure a hadoop cluster.
> 
> Since Chukwa has an agent installed in each endpoint, do you have any
> plan to build it as a universal platform for hadoop management?
> 
> Thanks.
> MJ
> 
> 
> Eric Yang wrote:
>> Chukwa is a generic distributed log processing system.  It's primary use
>> case is to monitor Hadoop cluster.  There are several analytics bundled for
>> displaying system state, java vm resource usage, Hadoop dfs, mapreduce
>> metrics.  However, anyone could add their own analytics system to run in
>> Chukwa.
>> 
>> In general, the monitoring system is usually independent from the subject
>> which being monitored.  Chukwa documentation might look like you need two
>> clusters for this to work.  However, it's actually possible for Chukwa to
>> run on the same cluster as it's monitoring.
>> 
>> It's better to call chukwa as a reporting system if Chukwa is running on the
>> same cluster.  If hadoop crashed in this type of deployment, chukwa would
>> not be responsible for not alerting.
>> 
>> Regards,
>> Eric
>> 
>> On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:
>> 
>>>  
>>> Hi.
>>> 
>>> It is another ``what for'' question.
>>> 
>>> I went thru the chukwa web site and am still kind of confused by what is
>>> software really for. Can I say the major purpose is to provide 1) a
>>> generic distributed log processing system, or 2) this log system is only
>>> for Hadoop cluster? In case of 1), why do we want to make it tightly
>>> bound to Hadoop?
>>> 
>>> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
>>> process the cluster logs, I still need to create another hadoop cluster
>>> to make it work?
>>> 
>>> I think some practical use cases could reduce the confusions of this
>>> this project.
>>> 
>>> Thanks.
>>> MJ
>>> 
>> 
>>

Re: What's Chukwa for?

Posted by MJ Lai <mj...@gmail.com>.

Eric, Ari.

Thanks for your responses.

I have some extended questions. I'm trying to figure out what is the 
best way to manage a production hadoop cluster, features include:
- deployment: install hadoop from one console;
- monitoring
- configure
- log/system analytics
- software upgrade
- etc.

It seems Ganglia/Nagio are widely used to monitor hadoop cluster 
metrics, and Chukwa is for log analytics. But still, it is a pain to 
manage/configure a hadoop cluster.

Since Chukwa has an agent installed in each endpoint, do you have any 
plan to build it as a universal platform for hadoop management?

Thanks.
MJ


Eric Yang wrote:
> Chukwa is a generic distributed log processing system.  It's primary use
> case is to monitor Hadoop cluster.  There are several analytics bundled for
> displaying system state, java vm resource usage, Hadoop dfs, mapreduce
> metrics.  However, anyone could add their own analytics system to run in
> Chukwa.
> 
> In general, the monitoring system is usually independent from the subject
> which being monitored.  Chukwa documentation might look like you need two
> clusters for this to work.  However, it's actually possible for Chukwa to
> run on the same cluster as it's monitoring.
> 
> It's better to call chukwa as a reporting system if Chukwa is running on the
> same cluster.  If hadoop crashed in this type of deployment, chukwa would
> not be responsible for not alerting.
> 
> Regards,
> Eric
> 
> On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:
> 
>>  
>> Hi.
>>
>> It is another ``what for'' question.
>>
>> I went thru the chukwa web site and am still kind of confused by what is
>> software really for. Can I say the major purpose is to provide 1) a
>> generic distributed log processing system, or 2) this log system is only
>> for Hadoop cluster? In case of 1), why do we want to make it tightly
>> bound to Hadoop?
>>
>> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
>> process the cluster logs, I still need to create another hadoop cluster
>> to make it work?
>>
>> I think some practical use cases could reduce the confusions of this
>> this project.
>>
>> Thanks.
>> MJ
>>
> 
>

Re: What's Chukwa for?

Posted by Eric Yang <ey...@yahoo-inc.com>.

Chukwa is a generic distributed log processing system.  It's primary use
case is to monitor Hadoop cluster.  There are several analytics bundled for
displaying system state, java vm resource usage, Hadoop dfs, mapreduce
metrics.  However, anyone could add their own analytics system to run in
Chukwa.

In general, the monitoring system is usually independent from the subject
which being monitored.  Chukwa documentation might look like you need two
clusters for this to work.  However, it's actually possible for Chukwa to
run on the same cluster as it's monitoring.

It's better to call chukwa as a reporting system if Chukwa is running on the
same cluster.  If hadoop crashed in this type of deployment, chukwa would
not be responsible for not alerting.

Regards,
Eric

On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:

> 
>  
> Hi.
> 
> It is another ``what for'' question.
> 
> I went thru the chukwa web site and am still kind of confused by what is
> software really for. Can I say the major purpose is to provide 1) a
> generic distributed log processing system, or 2) this log system is only
> for Hadoop cluster? In case of 1), why do we want to make it tightly
> bound to Hadoop?
> 
> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
> process the cluster logs, I still need to create another hadoop cluster
> to make it work?
> 
> I think some practical use cases could reduce the confusions of this
> this project.
> 
> Thanks.
> MJ
>

What's Chukwa for?

Posted by MJ Lai <mj...@gmail.com>.

 
Hi.

It is another ``what for'' question.

I went thru the chukwa web site and am still kind of confused by what is 
software really for. Can I say the major purpose is to provide 1) a 
generic distributed log processing system, or 2) this log system is only 
for Hadoop cluster? In case of 1), why do we want to make it tightly 
bound to Hadoop?

Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to 
process the cluster logs, I still need to create another hadoop cluster 
to make it work?

I think some practical use cases could reduce the confusions of this 
this project.

Thanks.
MJ

Re: What is Chukwa for?

Posted by Eric Yang <ey...@yahoo-inc.com>.

On 11/24/09 2:43 PM, "Jorge Medina" <ce...@gmail.com> wrote:

> I would like to be able to see the log files as a single file.

Chukwa combines log files into time ordered sequence file.  This use case is
in the original plan for Chukwa.

> But then I found Chukwa, and from the architecture it seems to do what I am
> looking for. Could anybody give me a confirmation that indeed Chuckwa
> addresses my problem ? What parts would I need to write?
> Does Chuwka comes with a UI ? Does it allow to query the centralized log ?

Chukwa comes with a UI to visualize log data after post processing.  It does
not support direct search and viewing of log files.  However, this use case
is also in the original intent in building chukwa.  The current blocker for
providing log searching capability is to build a full body index engine.
Some people suggested to build this on top of solar + kattta to provide this
feature.  However, no one is currently working on building the full text
index feature in Chukwa.

Patches are welcome. :)

Regards,
Eric