You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@chukwa.apache.org by MJ Lai <mj...@gmail.com> on 2009/12/03 00:30:49 UTC

What's Chukwa for?

 
Hi.

It is another ``what for'' question.

I went thru the chukwa web site and am still kind of confused by what is 
software really for. Can I say the major purpose is to provide 1) a 
generic distributed log processing system, or 2) this log system is only 
for Hadoop cluster? In case of 1), why do we want to make it tightly 
bound to Hadoop?

Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to 
process the cluster logs, I still need to create another hadoop cluster 
to make it work?

I think some practical use cases could reduce the confusions of this 
this project.

Thanks.
MJ

Re: What's Chukwa for?

Posted by Ariel Rabkin <as...@gmail.com>.

We're tied to Hadoop for storage and processing.  You can use Chukwa
to process logs from any source, not just Hadoop. At Berkeley, we use
it to process logs from a whole range of experimental software
systems.

--Ari

On Wed, Dec 2, 2009 at 3:30 PM, MJ Lai <mj...@gmail.com> wrote:
>
> Hi.
>
> It is another ``what for'' question.
>
> I went thru the chukwa web site and am still kind of confused by what is
> software really for. Can I say the major purpose is to provide 1) a generic
> distributed log processing system, or 2) this log system is only for Hadoop
> cluster? In case of 1), why do we want to make it tightly bound to Hadoop?
>
> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
> process the cluster logs, I still need to create another hadoop cluster to
> make it work?
>
> I think some practical use cases could reduce the confusions of this this
> project.
>
> Thanks.
> MJ
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: What's Chukwa for?

Posted by Eric Yang <ey...@yahoo-inc.com>.

Hi MJ,

When Chukwa Agent was introduced in the design,  it was designed to be both
configuration management and monitoring agent.  However, the current agent
implementation does not have configuration management capabilities nor limit
developer to add them.  The community is currently focused on monitoring
agent for now.

Regards,
Eric

On 12/3/09 3:42 PM, "MJ Lai" <mj...@gmail.com> wrote:

> 
> Eric, Ari.
> 
> Thanks for your responses.
> 
> I have some extended questions. I'm trying to figure out what is the
> best way to manage a production hadoop cluster, features include:
> - deployment: install hadoop from one console;
> - monitoring
> - configure
> - log/system analytics
> - software upgrade
> - etc.
> 
> It seems Ganglia/Nagio are widely used to monitor hadoop cluster
> metrics, and Chukwa is for log analytics. But still, it is a pain to
> manage/configure a hadoop cluster.
> 
> Since Chukwa has an agent installed in each endpoint, do you have any
> plan to build it as a universal platform for hadoop management?
> 
> Thanks.
> MJ
> 
> 
> Eric Yang wrote:
>> Chukwa is a generic distributed log processing system.  It's primary use
>> case is to monitor Hadoop cluster.  There are several analytics bundled for
>> displaying system state, java vm resource usage, Hadoop dfs, mapreduce
>> metrics.  However, anyone could add their own analytics system to run in
>> Chukwa.
>> 
>> In general, the monitoring system is usually independent from the subject
>> which being monitored.  Chukwa documentation might look like you need two
>> clusters for this to work.  However, it's actually possible for Chukwa to
>> run on the same cluster as it's monitoring.
>> 
>> It's better to call chukwa as a reporting system if Chukwa is running on the
>> same cluster.  If hadoop crashed in this type of deployment, chukwa would
>> not be responsible for not alerting.
>> 
>> Regards,
>> Eric
>> 
>> On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:
>> 
>>>  
>>> Hi.
>>> 
>>> It is another ``what for'' question.
>>> 
>>> I went thru the chukwa web site and am still kind of confused by what is
>>> software really for. Can I say the major purpose is to provide 1) a
>>> generic distributed log processing system, or 2) this log system is only
>>> for Hadoop cluster? In case of 1), why do we want to make it tightly
>>> bound to Hadoop?
>>> 
>>> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
>>> process the cluster logs, I still need to create another hadoop cluster
>>> to make it work?
>>> 
>>> I think some practical use cases could reduce the confusions of this
>>> this project.
>>> 
>>> Thanks.
>>> MJ
>>> 
>> 
>>

Re: What's Chukwa for?

Posted by MJ Lai <mj...@gmail.com>.

Eric, Ari.

Thanks for your responses.

I have some extended questions. I'm trying to figure out what is the 
best way to manage a production hadoop cluster, features include:
- deployment: install hadoop from one console;
- monitoring
- configure
- log/system analytics
- software upgrade
- etc.

It seems Ganglia/Nagio are widely used to monitor hadoop cluster 
metrics, and Chukwa is for log analytics. But still, it is a pain to 
manage/configure a hadoop cluster.

Since Chukwa has an agent installed in each endpoint, do you have any 
plan to build it as a universal platform for hadoop management?

Thanks.
MJ


Eric Yang wrote:
> Chukwa is a generic distributed log processing system.  It's primary use
> case is to monitor Hadoop cluster.  There are several analytics bundled for
> displaying system state, java vm resource usage, Hadoop dfs, mapreduce
> metrics.  However, anyone could add their own analytics system to run in
> Chukwa.
> 
> In general, the monitoring system is usually independent from the subject
> which being monitored.  Chukwa documentation might look like you need two
> clusters for this to work.  However, it's actually possible for Chukwa to
> run on the same cluster as it's monitoring.
> 
> It's better to call chukwa as a reporting system if Chukwa is running on the
> same cluster.  If hadoop crashed in this type of deployment, chukwa would
> not be responsible for not alerting.
> 
> Regards,
> Eric
> 
> On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:
> 
>>  
>> Hi.
>>
>> It is another ``what for'' question.
>>
>> I went thru the chukwa web site and am still kind of confused by what is
>> software really for. Can I say the major purpose is to provide 1) a
>> generic distributed log processing system, or 2) this log system is only
>> for Hadoop cluster? In case of 1), why do we want to make it tightly
>> bound to Hadoop?
>>
>> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
>> process the cluster logs, I still need to create another hadoop cluster
>> to make it work?
>>
>> I think some practical use cases could reduce the confusions of this
>> this project.
>>
>> Thanks.
>> MJ
>>
> 
>

Re: What's Chukwa for?

Posted by Eric Yang <ey...@yahoo-inc.com>.

Chukwa is a generic distributed log processing system.  It's primary use
case is to monitor Hadoop cluster.  There are several analytics bundled for
displaying system state, java vm resource usage, Hadoop dfs, mapreduce
metrics.  However, anyone could add their own analytics system to run in
Chukwa.

In general, the monitoring system is usually independent from the subject
which being monitored.  Chukwa documentation might look like you need two
clusters for this to work.  However, it's actually possible for Chukwa to
run on the same cluster as it's monitoring.

It's better to call chukwa as a reporting system if Chukwa is running on the
same cluster.  If hadoop crashed in this type of deployment, chukwa would
not be responsible for not alerting.

Regards,
Eric

On 12/2/09 3:30 PM, "MJ Lai" <mj...@gmail.com> wrote:

> 
>  
> Hi.
> 
> It is another ``what for'' question.
> 
> I went thru the chukwa web site and am still kind of confused by what is
> software really for. Can I say the major purpose is to provide 1) a
> generic distributed log processing system, or 2) this log system is only
> for Hadoop cluster? In case of 1), why do we want to make it tightly
> bound to Hadoop?
> 
> Assume we have a 100-machine cluster (no hadoop), if I deploy Chukwa to
> process the cluster logs, I still need to create another hadoop cluster
> to make it work?
> 
> I think some practical use cases could reduce the confusions of this
> this project.
> 
> Thanks.
> MJ
>