You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2013/04/19 06:23:02 UTC

Best way to collect Hadoop logs across cluster

Hi,

my clusters are on EC2, and they disappear after the cluster's instances
are destroyed. What is the best practice to collect the logs for later
storage?

EC2 does exactly that with their EMR, how do they do it?

Thank you,
Mark

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Actually the problem is not simple. Based on these problems, there are
three companies working on them:
- Loggly:
  http://loggly.com/

  Loggly is a part of the Amazon Marketplace:
  https://aws.amazon.com/solution-providers/isv/loggly

- Pageduty:
  https://papertrailapp.com/

  How to do it:

http://help.papertrailapp.com/discussions/questions/1767-logging-to-heroku-papertrail-addon-from-an-app-on-external-server

 http://help.papertrailapp.com/kb/how-it-works/permanent-log-archives

http://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hive


- Splunk, with its Cloud platform Splunk Storm:
  https://www.splunkstorm.com/


With any of these applications, you can collect permanently or based on
schedule all generated logs from Amazon EC2, which is the base service for
EMR and the others.

Best wishes.



2013/4/19 Mark Kerzner <ma...@shmsoft.com>

> So you are saying, the problem is very simple. Just before you destroy the
> cluster, simply collect the logs to S3. Anyway, I only need them after I
> have completed with a specific computation. So I don't need any special
> requirements.
>
> In regular permanent clusters, is there something that allows you to see
> all logs in one place?
>
> Thank you,
> Mark
>
>
> On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
> marcosluis2186@gmail.com> wrote:
>
>> When you destroy an EC2 instance, the correct behavior is to erase all
>> data.
>> Why don't you create a service to collect the logs directly to a S3
>> bucket in real-time or in a batch of 5 mins?
>>
>>
>> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>>
>>> Hi,
>>>
>>> my clusters are on EC2, and they disappear after the cluster's instances
>>> are destroyed. What is the best practice to collect the logs for later
>>> storage?
>>>
>>> EC2 does exactly that with their EMR, how do they do it?
>>>
>>> Thank you,
>>> Mark
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda,
>> *Data-Driven Product Manager* at PDVSA
>> *Blog*: http://dataddict.wordpress.com/
>> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
>> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>>
>
>


-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Actually the problem is not simple. Based on these problems, there are
three companies working on them:
- Loggly:
  http://loggly.com/

  Loggly is a part of the Amazon Marketplace:
  https://aws.amazon.com/solution-providers/isv/loggly

- Pageduty:
  https://papertrailapp.com/

  How to do it:

http://help.papertrailapp.com/discussions/questions/1767-logging-to-heroku-papertrail-addon-from-an-app-on-external-server

 http://help.papertrailapp.com/kb/how-it-works/permanent-log-archives

http://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hive


- Splunk, with its Cloud platform Splunk Storm:
  https://www.splunkstorm.com/


With any of these applications, you can collect permanently or based on
schedule all generated logs from Amazon EC2, which is the base service for
EMR and the others.

Best wishes.



2013/4/19 Mark Kerzner <ma...@shmsoft.com>

> So you are saying, the problem is very simple. Just before you destroy the
> cluster, simply collect the logs to S3. Anyway, I only need them after I
> have completed with a specific computation. So I don't need any special
> requirements.
>
> In regular permanent clusters, is there something that allows you to see
> all logs in one place?
>
> Thank you,
> Mark
>
>
> On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
> marcosluis2186@gmail.com> wrote:
>
>> When you destroy an EC2 instance, the correct behavior is to erase all
>> data.
>> Why don't you create a service to collect the logs directly to a S3
>> bucket in real-time or in a batch of 5 mins?
>>
>>
>> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>>
>>> Hi,
>>>
>>> my clusters are on EC2, and they disappear after the cluster's instances
>>> are destroyed. What is the best practice to collect the logs for later
>>> storage?
>>>
>>> EC2 does exactly that with their EMR, how do they do it?
>>>
>>> Thank you,
>>> Mark
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda,
>> *Data-Driven Product Manager* at PDVSA
>> *Blog*: http://dataddict.wordpress.com/
>> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
>> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>>
>
>


-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Actually the problem is not simple. Based on these problems, there are
three companies working on them:
- Loggly:
  http://loggly.com/

  Loggly is a part of the Amazon Marketplace:
  https://aws.amazon.com/solution-providers/isv/loggly

- Pageduty:
  https://papertrailapp.com/

  How to do it:

http://help.papertrailapp.com/discussions/questions/1767-logging-to-heroku-papertrail-addon-from-an-app-on-external-server

 http://help.papertrailapp.com/kb/how-it-works/permanent-log-archives

http://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hive


- Splunk, with its Cloud platform Splunk Storm:
  https://www.splunkstorm.com/


With any of these applications, you can collect permanently or based on
schedule all generated logs from Amazon EC2, which is the base service for
EMR and the others.

Best wishes.



2013/4/19 Mark Kerzner <ma...@shmsoft.com>

> So you are saying, the problem is very simple. Just before you destroy the
> cluster, simply collect the logs to S3. Anyway, I only need them after I
> have completed with a specific computation. So I don't need any special
> requirements.
>
> In regular permanent clusters, is there something that allows you to see
> all logs in one place?
>
> Thank you,
> Mark
>
>
> On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
> marcosluis2186@gmail.com> wrote:
>
>> When you destroy an EC2 instance, the correct behavior is to erase all
>> data.
>> Why don't you create a service to collect the logs directly to a S3
>> bucket in real-time or in a batch of 5 mins?
>>
>>
>> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>>
>>> Hi,
>>>
>>> my clusters are on EC2, and they disappear after the cluster's instances
>>> are destroyed. What is the best practice to collect the logs for later
>>> storage?
>>>
>>> EC2 does exactly that with their EMR, how do they do it?
>>>
>>> Thank you,
>>> Mark
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda,
>> *Data-Driven Product Manager* at PDVSA
>> *Blog*: http://dataddict.wordpress.com/
>> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
>> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>>
>
>


-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

Actually the problem is not simple. Based on these problems, there are
three companies working on them:
- Loggly:
  http://loggly.com/

  Loggly is a part of the Amazon Marketplace:
  https://aws.amazon.com/solution-providers/isv/loggly

- Pageduty:
  https://papertrailapp.com/

  How to do it:

http://help.papertrailapp.com/discussions/questions/1767-logging-to-heroku-papertrail-addon-from-an-app-on-external-server

 http://help.papertrailapp.com/kb/how-it-works/permanent-log-archives

http://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hive


- Splunk, with its Cloud platform Splunk Storm:
  https://www.splunkstorm.com/


With any of these applications, you can collect permanently or based on
schedule all generated logs from Amazon EC2, which is the base service for
EMR and the others.

Best wishes.



2013/4/19 Mark Kerzner <ma...@shmsoft.com>

> So you are saying, the problem is very simple. Just before you destroy the
> cluster, simply collect the logs to S3. Anyway, I only need them after I
> have completed with a specific computation. So I don't need any special
> requirements.
>
> In regular permanent clusters, is there something that allows you to see
> all logs in one place?
>
> Thank you,
> Mark
>
>
> On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
> marcosluis2186@gmail.com> wrote:
>
>> When you destroy an EC2 instance, the correct behavior is to erase all
>> data.
>> Why don't you create a service to collect the logs directly to a S3
>> bucket in real-time or in a batch of 5 mins?
>>
>>
>> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>>
>>> Hi,
>>>
>>> my clusters are on EC2, and they disappear after the cluster's instances
>>> are destroyed. What is the best practice to collect the logs for later
>>> storage?
>>>
>>> EC2 does exactly that with their EMR, how do they do it?
>>>
>>> Thank you,
>>> Mark
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda,
>> *Data-Driven Product Manager* at PDVSA
>> *Blog*: http://dataddict.wordpress.com/
>> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
>> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>>
>
>


-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

So you are saying, the problem is very simple. Just before you destroy the
cluster, simply collect the logs to S3. Anyway, I only need them after I
have completed with a specific computation. So I don't need any special
requirements.

In regular permanent clusters, is there something that allows you to see
all logs in one place?

Thank you,
Mark

On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
marcosluis2186@gmail.com> wrote:

> When you destroy an EC2 instance, the correct behavior is to erase all
> data.
> Why don't you create a service to collect the logs directly to a S3 bucket
> in real-time or in a batch of 5 mins?
>
>
> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>
>> Hi,
>>
>> my clusters are on EC2, and they disappear after the cluster's instances
>> are destroyed. What is the best practice to collect the logs for later
>> storage?
>>
>> EC2 does exactly that with their EMR, how do they do it?
>>
>> Thank you,
>> Mark
>>
>
>
>
> --
> Marcos Ortiz Valmaseda,
> *Data-Driven Product Manager* at PDVSA
> *Blog*: http://dataddict.wordpress.com/
> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

So you are saying, the problem is very simple. Just before you destroy the
cluster, simply collect the logs to S3. Anyway, I only need them after I
have completed with a specific computation. So I don't need any special
requirements.

In regular permanent clusters, is there something that allows you to see
all logs in one place?

Thank you,
Mark

On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
marcosluis2186@gmail.com> wrote:

> When you destroy an EC2 instance, the correct behavior is to erase all
> data.
> Why don't you create a service to collect the logs directly to a S3 bucket
> in real-time or in a batch of 5 mins?
>
>
> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>
>> Hi,
>>
>> my clusters are on EC2, and they disappear after the cluster's instances
>> are destroyed. What is the best practice to collect the logs for later
>> storage?
>>
>> EC2 does exactly that with their EMR, how do they do it?
>>
>> Thank you,
>> Mark
>>
>
>
>
> --
> Marcos Ortiz Valmaseda,
> *Data-Driven Product Manager* at PDVSA
> *Blog*: http://dataddict.wordpress.com/
> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

So you are saying, the problem is very simple. Just before you destroy the
cluster, simply collect the logs to S3. Anyway, I only need them after I
have completed with a specific computation. So I don't need any special
requirements.

In regular permanent clusters, is there something that allows you to see
all logs in one place?

Thank you,
Mark

On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
marcosluis2186@gmail.com> wrote:

> When you destroy an EC2 instance, the correct behavior is to erase all
> data.
> Why don't you create a service to collect the logs directly to a S3 bucket
> in real-time or in a batch of 5 mins?
>
>
> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>
>> Hi,
>>
>> my clusters are on EC2, and they disappear after the cluster's instances
>> are destroyed. What is the best practice to collect the logs for later
>> storage?
>>
>> EC2 does exactly that with their EMR, how do they do it?
>>
>> Thank you,
>> Mark
>>
>
>
>
> --
> Marcos Ortiz Valmaseda,
> *Data-Driven Product Manager* at PDVSA
> *Blog*: http://dataddict.wordpress.com/
> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

So you are saying, the problem is very simple. Just before you destroy the
cluster, simply collect the logs to S3. Anyway, I only need them after I
have completed with a specific computation. So I don't need any special
requirements.

In regular permanent clusters, is there something that allows you to see
all logs in one place?

Thank you,
Mark

On Thu, Apr 18, 2013 at 11:51 PM, Marcos Luis Ortiz Valmaseda <
marcosluis2186@gmail.com> wrote:

> When you destroy an EC2 instance, the correct behavior is to erase all
> data.
> Why don't you create a service to collect the logs directly to a S3 bucket
> in real-time or in a batch of 5 mins?
>
>
> 2013/4/18 Mark Kerzner <ma...@shmsoft.com>
>
>> Hi,
>>
>> my clusters are on EC2, and they disappear after the cluster's instances
>> are destroyed. What is the best practice to collect the logs for later
>> storage?
>>
>> EC2 does exactly that with their EMR, how do they do it?
>>
>> Thank you,
>> Mark
>>
>
>
>
> --
> Marcos Ortiz Valmaseda,
> *Data-Driven Product Manager* at PDVSA
> *Blog*: http://dataddict.wordpress.com/
> *LinkedIn: *http://www.linkedin.com/in/marcosluis2186
> *Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>
>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3 bucket
in real-time or in a batch of 5 mins?


2013/4/18 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances
> are destroyed. What is the best practice to collect the logs for later
> storage?
>
> EC2 does exactly that with their EMR, how do they do it?
>
> Thank you,
> Mark
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3 bucket
in real-time or in a batch of 5 mins?


2013/4/18 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances
> are destroyed. What is the best practice to collect the logs for later
> storage?
>
> EC2 does exactly that with their EMR, how do they do it?
>
> Thank you,
> Mark
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3 bucket
in real-time or in a batch of 5 mins?


2013/4/18 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances
> are destroyed. What is the best practice to collect the logs for later
> storage?
>
> EC2 does exactly that with their EMR, how do they do it?
>
> Thank you,
> Mark
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you for all the advice, it was indeed very useful.

Mark


On Thu, Apr 18, 2013 at 11:44 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > my clusters are on EC2, and they disappear after the cluster's instances
> are
> > destroyed. What is the best practice to collect the logs for later
> storage?
> >
> > EC2 does exactly that with their EMR, how do they do it?
>
> Apache Flume could be extremely useful for this purpose. You
> can even configure it to deposit log data in realtime into
> S3.
>
> Thanks,
> Roman.
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you for all the advice, it was indeed very useful.

Mark


On Thu, Apr 18, 2013 at 11:44 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > my clusters are on EC2, and they disappear after the cluster's instances
> are
> > destroyed. What is the best practice to collect the logs for later
> storage?
> >
> > EC2 does exactly that with their EMR, how do they do it?
>
> Apache Flume could be extremely useful for this purpose. You
> can even configure it to deposit log data in realtime into
> S3.
>
> Thanks,
> Roman.
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you for all the advice, it was indeed very useful.

Mark


On Thu, Apr 18, 2013 at 11:44 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > my clusters are on EC2, and they disappear after the cluster's instances
> are
> > destroyed. What is the best practice to collect the logs for later
> storage?
> >
> > EC2 does exactly that with their EMR, how do they do it?
>
> Apache Flume could be extremely useful for this purpose. You
> can even configure it to deposit log data in realtime into
> S3.
>
> Thanks,
> Roman.
>

Re: Best way to collect Hadoop logs across cluster

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you for all the advice, it was indeed very useful.

Mark


On Thu, Apr 18, 2013 at 11:44 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
> > Hi,
> >
> > my clusters are on EC2, and they disappear after the cluster's instances
> are
> > destroyed. What is the best practice to collect the logs for later
> storage?
> >
> > EC2 does exactly that with their EMR, how do they do it?
>
> Apache Flume could be extremely useful for this purpose. You
> can even configure it to deposit log data in realtime into
> S3.
>
> Thanks,
> Roman.
>

Re: Best way to collect Hadoop logs across cluster

Posted by Roman Shaposhnik <rv...@apache.org>.

On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances are
> destroyed. What is the best practice to collect the logs for later storage?
>
> EC2 does exactly that with their EMR, how do they do it?

Apache Flume could be extremely useful for this purpose. You
can even configure it to deposit log data in realtime into
S3.

Thanks,
Roman.

Re: Best way to collect Hadoop logs across cluster

Posted by Marcos Luis Ortiz Valmaseda <ma...@gmail.com>.

When you destroy an EC2 instance, the correct behavior is to erase all
data.
Why don't you create a service to collect the logs directly to a S3 bucket
in real-time or in a batch of 5 mins?


2013/4/18 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances
> are destroyed. What is the best practice to collect the logs for later
> storage?
>
> EC2 does exactly that with their EMR, how do they do it?
>
> Thank you,
> Mark
>



-- 
Marcos Ortiz Valmaseda,
*Data-Driven Product Manager* at PDVSA
*Blog*: http://dataddict.wordpress.com/
*LinkedIn: *http://www.linkedin.com/in/marcosluis2186
*Twitter*: @marcosluis2186 <http://twitter.com/marcosluis2186>

Re: Best way to collect Hadoop logs across cluster

Posted by Roman Shaposhnik <rv...@apache.org>.

On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances are
> destroyed. What is the best practice to collect the logs for later storage?
>
> EC2 does exactly that with their EMR, how do they do it?

Apache Flume could be extremely useful for this purpose. You
can even configure it to deposit log data in realtime into
S3.

Thanks,
Roman.

Re: Best way to collect Hadoop logs across cluster

Posted by Roman Shaposhnik <rv...@apache.org>.

On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances are
> destroyed. What is the best practice to collect the logs for later storage?
>
> EC2 does exactly that with their EMR, how do they do it?

Apache Flume could be extremely useful for this purpose. You
can even configure it to deposit log data in realtime into
S3.

Thanks,
Roman.

Re: Best way to collect Hadoop logs across cluster

Posted by Roman Shaposhnik <rv...@apache.org>.

On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
>
> my clusters are on EC2, and they disappear after the cluster's instances are
> destroyed. What is the best practice to collect the logs for later storage?
>
> EC2 does exactly that with their EMR, how do they do it?

Apache Flume could be extremely useful for this purpose. You
can even configure it to deposit log data in realtime into
S3.

Thanks,
Roman.