You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Sagar Mehta <sa...@gmail.com> on 2013/10/11 07:36:24 UTC

State of Art in Hadoop Log aggregation

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was
wondering what is the state of art if I want to aggregate and visualize
Hadoop ecosystem logs, particularly

   1. Tasktracker logs
   2. Datanode logs
   3. Hbase RegionServer logs

One way is to use something like a Flume on each node to aggregate the logs
and then use something like Kibana -
http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs
 themselves. We currently log in to each machine individually to 'tail -F
logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized
way when there is an issue without having to login to 100 different
machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Sandy Ryza <sa...@cloudera.com>.

Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <ds...@rdx.com> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>

Re: State of Art in Hadoop Log aggregation

Posted by Sandy Ryza <sa...@cloudera.com>.

Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <ds...@rdx.com> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>

Re: State of Art in Hadoop Log aggregation

Posted by Sandy Ryza <sa...@cloudera.com>.

Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <ds...@rdx.com> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>

Re: State of Art in Hadoop Log aggregation

Posted by Sandy Ryza <sa...@cloudera.com>.

Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <ds...@rdx.com> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>

Re: State of Art in Hadoop Log aggregation

Posted by DSuiter RDX <ds...@rdx.com>.

Sagar,

It sounds like you want a management console. We are using Cloudera
Manager, but for 200 nodes you would need to license it, it is only free up
to 50 nodes.

The FOSS version of this is Ambari, iirc.
http://incubator.apache.org/ambari/

Flume will provide a Hadoop-integrated pipeline for ingesting data. The
data will still need to be analyzed and visualized if you use Flume. Kafka
is a newer project for collecting and aggregating logs, but is a separate
project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditing compliance agency.

Thanks,
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> Hi,
>
> http://flume.apache.org
>
> - Alex
>
> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: State of Art in Hadoop Log aggregation

Posted by DSuiter RDX <ds...@rdx.com>.

Sagar,

It sounds like you want a management console. We are using Cloudera
Manager, but for 200 nodes you would need to license it, it is only free up
to 50 nodes.

The FOSS version of this is Ambari, iirc.
http://incubator.apache.org/ambari/

Flume will provide a Hadoop-integrated pipeline for ingesting data. The
data will still need to be analyzed and visualized if you use Flume. Kafka
is a newer project for collecting and aggregating logs, but is a separate
project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditing compliance agency.

Thanks,
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> Hi,
>
> http://flume.apache.org
>
> - Alex
>
> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: State of Art in Hadoop Log aggregation

Posted by DSuiter RDX <ds...@rdx.com>.

Sagar,

It sounds like you want a management console. We are using Cloudera
Manager, but for 200 nodes you would need to license it, it is only free up
to 50 nodes.

The FOSS version of this is Ambari, iirc.
http://incubator.apache.org/ambari/

Flume will provide a Hadoop-integrated pipeline for ingesting data. The
data will still need to be analyzed and visualized if you use Flume. Kafka
is a newer project for collecting and aggregating logs, but is a separate
project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditing compliance agency.

Thanks,
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> Hi,
>
> http://flume.apache.org
>
> - Alex
>
> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: State of Art in Hadoop Log aggregation

Posted by DSuiter RDX <ds...@rdx.com>.

Sagar,

It sounds like you want a management console. We are using Cloudera
Manager, but for 200 nodes you would need to license it, it is only free up
to 50 nodes.

The FOSS version of this is Ambari, iirc.
http://incubator.apache.org/ambari/

Flume will provide a Hadoop-integrated pipeline for ingesting data. The
data will still need to be analyzed and visualized if you use Flume. Kafka
is a newer project for collecting and aggregating logs, but is a separate
project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditing compliance agency.

Thanks,
*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com

On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <wget.null@gmail.com
> wrote:

> Hi,
>
> http://flume.apache.org
>
> - Alex
>
> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>

Re: State of Art in Hadoop Log aggregation

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.

Hi,

http://flume.apache.org

- Alex

On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: State of Art in Hadoop Log aggregation

Posted by Pradeep Gollakota <pr...@gmail.com>.

There are plenty of log aggregation tools both open source and commercial
off the shelf. Here's some
http://devopsangle.com/2012/04/19/8-splunk-alternatives/

My personal recommendation is LogStash.


On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay <ra...@gmail.com>wrote:

> You can try Chukwa which is part of the incubating projects under Apache.
> Tried it before and liked it for aggregating logs.
>
> On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
>

RE: State of Art in Hadoop Log aggregation

Posted by "Smith, Joshua D." <Jo...@gd-ais.com>.

I've used Splunk in the past for log aggregation. It's commercial/proprietary, but I think there's a free version.
http://www.splunk.com/

From: Raymond Tay [mailto:raymondtay1974@gmail.com]
Sent: Friday, October 11, 2013 1:39 AM
To: user@hadoop.apache.org
Subject: Re: State of Art in Hadoop Log aggregation

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs.

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com>> wrote:

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly

  1.  Tasktracker logs
  2.  Datanode logs
  3.  Hbase RegionServer logs
One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Pradeep Gollakota <pr...@gmail.com>.

There are plenty of log aggregation tools both open source and commercial
off the shelf. Here's some
http://devopsangle.com/2012/04/19/8-splunk-alternatives/

My personal recommendation is LogStash.


On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay <ra...@gmail.com>wrote:

> You can try Chukwa which is part of the incubating projects under Apache.
> Tried it before and liked it for aggregating logs.
>
> On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
>

Re: State of Art in Hadoop Log aggregation

Posted by Pradeep Gollakota <pr...@gmail.com>.

There are plenty of log aggregation tools both open source and commercial
off the shelf. Here's some
http://devopsangle.com/2012/04/19/8-splunk-alternatives/

My personal recommendation is LogStash.


On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay <ra...@gmail.com>wrote:

> You can try Chukwa which is part of the incubating projects under Apache.
> Tried it before and liked it for aggregating logs.
>
> On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
>

Re: State of Art in Hadoop Log aggregation

Posted by Pradeep Gollakota <pr...@gmail.com>.

There are plenty of log aggregation tools both open source and commercial
off the shelf. Here's some
http://devopsangle.com/2012/04/19/8-splunk-alternatives/

My personal recommendation is LogStash.


On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay <ra...@gmail.com>wrote:

> You can try Chukwa which is part of the incubating projects under Apache.
> Tried it before and liked it for aggregating logs.
>
> On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:
>
> Hi Guys,
>
> We have fairly decent sized Hadoop cluster of about 200 nodes and was
> wondering what is the state of art if I want to aggregate and visualize
> Hadoop ecosystem logs, particularly
>
>    1. Tasktracker logs
>    2. Datanode logs
>    3. Hbase RegionServer logs
>
> One way is to use something like a Flume on each node to aggregate the
> logs and then use something like Kibana -
> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
> make them searchable.
>
> However I don't want to write another ETL for the hadoop/hbase logs
>  themselves. We currently log in to each machine individually to 'tail -F
> logs' when there is an hadoop problem on a particular node.
>
> We want a better way to look at the hadoop logs themselves in a
> centralized way when there is an issue without having to login to 100
> different machines and was wondering what is the state of are in this
> regard.
>
> Suggestions/Pointers are very welcome!!
>
> Sagar
>
>
>

RE: State of Art in Hadoop Log aggregation

Posted by "Smith, Joshua D." <Jo...@gd-ais.com>.

I've used Splunk in the past for log aggregation. It's commercial/proprietary, but I think there's a free version.
http://www.splunk.com/

From: Raymond Tay [mailto:raymondtay1974@gmail.com]
Sent: Friday, October 11, 2013 1:39 AM
To: user@hadoop.apache.org
Subject: Re: State of Art in Hadoop Log aggregation

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs.

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com>> wrote:

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly

  1.  Tasktracker logs
  2.  Datanode logs
  3.  Hbase RegionServer logs
One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

RE: State of Art in Hadoop Log aggregation

Posted by "Smith, Joshua D." <Jo...@gd-ais.com>.

I've used Splunk in the past for log aggregation. It's commercial/proprietary, but I think there's a free version.
http://www.splunk.com/

From: Raymond Tay [mailto:raymondtay1974@gmail.com]
Sent: Friday, October 11, 2013 1:39 AM
To: user@hadoop.apache.org
Subject: Re: State of Art in Hadoop Log aggregation

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs.

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com>> wrote:

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly

  1.  Tasktracker logs
  2.  Datanode logs
  3.  Hbase RegionServer logs
One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

RE: State of Art in Hadoop Log aggregation

Posted by "Smith, Joshua D." <Jo...@gd-ais.com>.

I've used Splunk in the past for log aggregation. It's commercial/proprietary, but I think there's a free version.
http://www.splunk.com/

From: Raymond Tay [mailto:raymondtay1974@gmail.com]
Sent: Friday, October 11, 2013 1:39 AM
To: user@hadoop.apache.org
Subject: Re: State of Art in Hadoop Log aggregation

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs.

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com>> wrote:

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly

  1.  Tasktracker logs
  2.  Datanode logs
  3.  Hbase RegionServer logs
One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Raymond Tay <ra...@gmail.com>.

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs. 

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Raymond Tay <ra...@gmail.com>.

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs. 

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Raymond Tay <ra...@gmail.com>.

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs. 

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Raymond Tay <ra...@gmail.com>.

You can try Chukwa which is part of the incubating projects under Apache. Tried it before and liked it for aggregating logs. 

On 11 Oct, 2013, at 1:36 PM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

Re: State of Art in Hadoop Log aggregation

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.

Hi,

http://flume.apache.org

- Alex

On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: State of Art in Hadoop Log aggregation

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.

Hi,

http://flume.apache.org

- Alex

On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF

Re: State of Art in Hadoop Log aggregation

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.

Hi,

http://flume.apache.org

- Alex

On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sa...@gmail.com> wrote:

> Hi Guys,
> 
> We have fairly decent sized Hadoop cluster of about 200 nodes and was wondering what is the state of art if I want to aggregate and visualize Hadoop ecosystem logs, particularly
> Tasktracker logs
> Datanode logs
> Hbase RegionServer logs
> One way is to use something like a Flume on each node to aggregate the logs and then use something like Kibana - http://www.elasticsearch.org/overview/kibana/ to visualize the logs and make them searchable.
> 
> However I don't want to write another ETL for the hadoop/hbase logs  themselves. We currently log in to each machine individually to 'tail -F logs' when there is an hadoop problem on a particular node.
> 
> We want a better way to look at the hadoop logs themselves in a centralized way when there is an issue without having to login to 100 different machines and was wondering what is the state of are in this regard.
> 
> Suggestions/Pointers are very welcome!!
> 
> Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF