You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by shahab mehmandoust <sh...@gmail.com> on 2008/11/04 18:51:31 UTC

Recovery from Failed Jobs

Hello,

I want to parse lines of an access logs, line by line with map/reduce.  I
want to know, once my access log is in the HDFS, am I guaranteed that every
line will be processed and results will be in the output dir?  In other
words, if a job fails, does hadoop know where it failed? and can hadoop
recover from that point so no data is lost?

Thanks,
Shahab

Re: Recovery from Failed Jobs

Posted by Alex Loddengaard <al...@cloudera.com>.
With regard to checkpointing, not yet.  This JIRA is a prerequisite:
<http://issues.apache.org/jira/browse/HADOOP-3245>

I'm a little confused about what you're trying to do with log parsing.  You
should consider Scribe or Chukwa, though Chukwa isn't ready to be used yet.
 Learn more here:

Chukwa:
<http://wiki.apache.org/hadoop/Chukwa>
<http://issues.apache.org/jira/browse/HADOOP-3719>

Scribe:
<
http://www.cloudera.com/blog/2008/10/28/installing-scribe-for-log-collection/
>
<
http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/
>

Alex

On Tue, Nov 4, 2008 at 11:51 AM, shahab mehmandoust <sh...@gmail.com>wrote:

> Hello,
>
> I want to parse lines of an access logs, line by line with map/reduce.  I
> want to know, once my access log is in the HDFS, am I guaranteed that every
> line will be processed and results will be in the output dir?  In other
> words, if a job fails, does hadoop know where it failed? and can hadoop
> recover from that point so no data is lost?
>
> Thanks,
> Shahab
>