You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/11/15 00:06:20 UTC

how to implement error thresholds in a map-reduce job ?

Hi,

I have a use  case where I want to pass a threshold value to a map-reduce
job. For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job
i.e. all mappers, is reached.

How can I implement this considering that each mapper would be processing
some part of the input data ?

Thanks,
-JJ

RE: how to implement error thresholds in a map-reduce job ?

Posted by Mingxi Wu <Mi...@turn.com>.

JJ,

Two passes are necessary. First pass, just count how many lines are wrong. You won't do any work on the data. It's just read the data. After this pass, record the file status "good"/"bad" in a status file.

The second pass, before you start, check the file status file, and if the input file is marked as good, go ahead. Otherwise, halt.

By dynamic counter, I mean the counter group with dynamic member name determined at run time. For example, below I defined a counter group named fileSanity, and I have two members in this group.
One is inputFileName:NORMAL, the other is inputFileName:MALFORMED.

public enum DataSanityType {
       NORMAL, //a good data line
       MALFORMED //a bad data line
}

In your mapper, add

If (line parsed successfully)
reporter.incrCounter("fileSanity", inputFileName +``:"+ NORMAL, 1);
else
reporter.incrCounter("fileSanity", inputFileName + ``:"+ MALFORMED, 1);

In your reducer close function,

Close(){

Int totalCnt = 0;

totalCnt = myReporter.getCounter("fileSanity", inputFileName +``:"+ NORMAL).getValue()+reporter.incrCounter("fileSanity", inputFileName + ``:"+ MALFORMED, 1);

if (reporter.incrCounter("fileSanity", inputFileName + ``:"+ MALFORMED, 1)/totalCnt < threshold)
  mark the file as bad in a HDFS status file (a file you create)
}

Hope this helps.

Mingxi




From: Mapred Learn [mailto:mapred.learn@gmail.com]
Sent: Tuesday, November 15, 2011 11:10 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: how to implement error thresholds in a map-reduce job ?

Hi Mingxi,
By dynamic counter you mean custom counter or is it a different kind of counter ?

plus I cannot do 2 passes as I ge to know about errors in record only when I parse the line.
Thanks,
-JJ
On Mon, Nov 14, 2011 at 3:38 PM, Mingxi Wu <Mi...@turn.com>> wrote:
You can do two passes of the data.
The first map-reduce pass is sanity checking the data.
The second map-reduce pass is to do the real work assuming the first pass accept the file.

You can utilize the dynamic counter and define an enum type for error record categories.
In the mapper, you parse each line, and use the result to update the counter.

-Mingxi

From: Mapred Learn [mailto:mapred.learn@gmail.com<ma...@gmail.com>]
Sent: Monday, November 14, 2011 3:06 PM
To: mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: how to implement error thresholds in a map-reduce job ?

Hi,

I have a use  case where I want to pass a threshold value to a map-reduce job. For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, is reached.

How can I implement this considering that each mapper would be processing some part of the input data ?

Thanks,
-JJ

Re: how to implement error thresholds in a map-reduce job ?

Posted by Mapred Learn <ma...@gmail.com>.

Hi Mingxi,
By dynamic counter you mean custom counter or is it a different kind of
counter ?

plus I cannot do 2 passes as I ge to know about errors in record only when
I parse the line.

Thanks,
-JJ
On Mon, Nov 14, 2011 at 3:38 PM, Mingxi Wu <Mi...@turn.com> wrote:

>  You can do two passes of the data. ****
>
> The first map-reduce pass is sanity checking the data. ****
>
> The second map-reduce pass is to do the real work assuming the first pass
> accept the file. ****
>
> ** **
>
> You can utilize the dynamic counter and define an enum type for error
> record categories. ****
>
> In the mapper, you parse each line, and use the result to update the
> counter. ****
>
> ** **
>
> -Mingxi****
>
> ** **
>
> *From:* Mapred Learn [mailto:mapred.learn@gmail.com]
> *Sent:* Monday, November 14, 2011 3:06 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* how to implement error thresholds in a map-reduce job ?****
>
> ** **
>
> Hi,****
>
>  ****
>
> I have a use  case where I want to pass a threshold value to a map-reduce
> job. For eg: error records=10.****
>
>  ****
>
> I want map-reduce job to fail if total count of error_records in the job
> i.e. all mappers, is reached.****
>
>  ****
>
> How can I implement this considering that each mapper would be processing
> some part of the input data ?****
>
>  ****
>
> Thanks,****
>
> -JJ****
>

RE: how to implement error thresholds in a map-reduce job ?

Posted by Mingxi Wu <Mi...@turn.com>.

You can do two passes of the data.
The first map-reduce pass is sanity checking the data.
The second map-reduce pass is to do the real work assuming the first pass accept the file.

You can utilize the dynamic counter and define an enum type for error record categories.
In the mapper, you parse each line, and use the result to update the counter.

-Mingxi

From: Mapred Learn [mailto:mapred.learn@gmail.com]
Sent: Monday, November 14, 2011 3:06 PM
To: mapreduce-user@hadoop.apache.org
Subject: how to implement error thresholds in a map-reduce job ?

Hi,

I have a use  case where I want to pass a threshold value to a map-reduce job. For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, is reached.

How can I implement this considering that each mapper would be processing some part of the input data ?

Thanks,
-JJ

Re: how to implement error thresholds in a map-reduce job ?

Posted by Sudharsan Sampath <su...@gmail.com>.

Hi,

If you mirror the logic of checking the error condition in both mapper and
reducer (from the counters), you have a higher probability that the job
will fail as early as possible. The mappers are not guaranteed to get the
last updated value of a counter from all the mappers and if it slips thru
this case then your reducer can handle this.

Also, you are not limited to using a single reducer as all reducers will
race to fail on this condition. Otherwise you get the same parallelism in
processing as you had earlier.

Thanks
Sudhan S


On Wed, Nov 16, 2011 at 1:33 PM, Mapred Learn <ma...@gmail.com>wrote:

> Thanks Harsh for a descriptive response.
>
> This means that all mappers would finish before we can find out if there
> were errors, right ? Even though first mapper might have reached this
> threshold.
>
> Thanks,
>
> Sent from my iPhone
>
> On Nov 15, 2011, at 9:21 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Ah so the threshold is job-level, not per task. OK.
>
> One other way I think would be performant, AND still able to use Hadoop
> itself would be to keep one reducer for this job, and have that reducer
> check if the counter of total failed records exceeds the threshold or not.
> A reducer is guaranteed to have gotten the total aggregate of map side
> counters since it begins only after all maps complete. The reducer can then
> go ahead and fail itself to fail the job or pass through. Your maps may
> output their data directly - the reducer is just to decide if the mappers
> were alright (Perhaps send failed counts as KV to the reducer, to avoid
> looking up Hadoop counters from within tasks -- but this would easily apply
> only to Map-only jobs. For MR jobs, it may be a bit more complicated to add
> this in, but surely still doable with some partitioner and comparator
> tweaks).
>
> But, also good to fail if a single map task itself exceeds > 10. The above
> is to ensure the global check, while doing this as well would ensure faster
> failure depending on the situation.
>
> On 16-Nov-2011, at 1:16 AM, Mapred Learn wrote:
>
> Hi Harsh,
>
> My situation is to kill a job when this threshold is reached. If say
> threshold is 10. And 2 mappers combined reached this value, how should I
> achieve this.
>
> With what you are saying, I think job will fail once a single mapper
> reaches that threshold.
>
> Thanks,
>
>
> On Tue, Nov 15, 2011 at 11:22 AM, Harsh J < <ha...@cloudera.com>
> harsh@cloudera.com> wrote:
>
>> Mapred,
>>
>> If you fail a task permanently upon encountering a bad situation, you
>> basically end up failing the job as well, automatically. By controlling the
>> number of retries (say down to 1 or 2 from 4 default total attempts), you
>> can also have it fail the job faster.
>>
>> Is killing the job immediately a necessity? Why?
>>
>> I s'pose you could call kill from within the mapper, but I've never seen
>> that as necessary in any situation so far. Whats wrong with letting the job
>> auto-die as a result of a failing task?
>>
>>  On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:
>>
>>  Thanks David for a step-by-step response but this makes error
>> threshold, a per mapper threshold. Is there a way to make it per job so
>> that all mappers share this value and increment it as a shared counter ?
>>
>>
>> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch < <da...@darose.net>
>> darose@darose.net> wrote:
>>
>>>  On 11/14/2011 06:06 PM, Mapred Learn wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a use  case where I want to pass a threshold value to a
>>>> map-reduce
>>>> job. For eg: error records=10.
>>>>
>>>> I want map-reduce job to fail if total count of error_records in the job
>>>> i.e. all mappers, is reached.
>>>>
>>>> How can I implement this considering that each mapper would be
>>>> processing
>>>> some part of the input data ?
>>>>
>>>> Thanks,
>>>> -JJ
>>>>
>>>
>>> 1) Pass in the threshold value as configuration value of the M/R job.
>>> (i.e., job.getConfiguration().setInt(**"error_threshold", 10) )
>>>
>>> 2) Make your mappers implement the Configurable interface.  This will
>>> ensure that every mapper gets passed a copy of the config object.
>>>
>>> 3) When you implement the setConf() method in your mapper (which
>>> Configurable will force you to do), retrieve the threshold value from the
>>> config and save it in an instance variable in the mapper.  (i.e., int
>>> errorThreshold = conf.getInt("error_threshold") )
>>>
>>> 4) In the mapper, when an error record occurs, increment a counter and
>>> then check if the counter value exceeds the threshold.  If so, throw an
>>> exception.  (e.g., if (++numErrors >= errorThreshold) throw new
>>> RuntimeException("Too many errors") )
>>>
>>> The exception will kill the mapper.  Hadoop will attempt to re-run it,
>>> but subsequent attempts will also fail for the same reason, and eventually
>>> the entire job will fail.
>>>
>>> HTH,
>>>
>>> DR
>>>
>>
>>
>>
>
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by Mapred Learn <ma...@gmail.com>.

Thanks Harsh for a descriptive response.

This means that all mappers would finish before we can find out if there were errors, right ? Even though first mapper might have reached this threshold.

Thanks,

Sent from my iPhone

On Nov 15, 2011, at 9:21 PM, Harsh J <ha...@cloudera.com> wrote:

> Ah so the threshold is job-level, not per task. OK.
> 
> One other way I think would be performant, AND still able to use Hadoop itself would be to keep one reducer for this job, and have that reducer check if the counter of total failed records exceeds the threshold or not. A reducer is guaranteed to have gotten the total aggregate of map side counters since it begins only after all maps complete. The reducer can then go ahead and fail itself to fail the job or pass through. Your maps may output their data directly - the reducer is just to decide if the mappers were alright (Perhaps send failed counts as KV to the reducer, to avoid looking up Hadoop counters from within tasks -- but this would easily apply only to Map-only jobs. For MR jobs, it may be a bit more complicated to add this in, but surely still doable with some partitioner and comparator tweaks).
> 
> But, also good to fail if a single map task itself exceeds > 10. The above is to ensure the global check, while doing this as well would ensure faster failure depending on the situation.
> 
> On 16-Nov-2011, at 1:16 AM, Mapred Learn wrote:
> 
>> Hi Harsh,
>>  
>> My situation is to kill a job when this threshold is reached. If say threshold is 10. And 2 mappers combined reached this value, how should I achieve this.
>>  
>> With what you are saying, I think job will fail once a single mapper reaches that threshold.
>>  
>> Thanks,
>> 
>>  
>> On Tue, Nov 15, 2011 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
>> Mapred,
>> 
>> If you fail a task permanently upon encountering a bad situation, you basically end up failing the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from 4 default total attempts), you can also have it fail the job faster.
>> 
>> Is killing the job immediately a necessity? Why?
>> 
>> I s'pose you could call kill from within the mapper, but I've never seen that as necessary in any situation so far. Whats wrong with letting the job auto-die as a result of a failing task?
>> 
>> On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:
>> 
>>> Thanks David for a step-by-step response but this makes error threshold, a per mapper threshold. Is there a way to make it per job so that all mappers share this value and increment it as a shared counter ?
>>> 
>>>  
>>> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net> wrote:
>>> On 11/14/2011 06:06 PM, Mapred Learn wrote:
>>> Hi,
>>> 
>>> I have a use  case where I want to pass a threshold value to a map-reduce
>>> job. For eg: error records=10.
>>> 
>>> I want map-reduce job to fail if total count of error_records in the job
>>> i.e. all mappers, is reached.
>>> 
>>> How can I implement this considering that each mapper would be processing
>>> some part of the input data ?
>>> 
>>> Thanks,
>>> -JJ
>>> 
>>> 1) Pass in the threshold value as configuration value of the M/R job. (i.e., job.getConfiguration().setInt("error_threshold", 10) )
>>> 
>>> 2) Make your mappers implement the Configurable interface.  This will ensure that every mapper gets passed a copy of the config object.
>>> 
>>> 3) When you implement the setConf() method in your mapper (which Configurable will force you to do), retrieve the threshold value from the config and save it in an instance variable in the mapper.  (i.e., int errorThreshold = conf.getInt("error_threshold") )
>>> 
>>> 4) In the mapper, when an error record occurs, increment a counter and then check if the counter value exceeds the threshold.  If so, throw an exception.  (e.g., if (++numErrors >= errorThreshold) throw new RuntimeException("Too many errors") )
>>> 
>>> The exception will kill the mapper.  Hadoop will attempt to re-run it, but subsequent attempts will also fail for the same reason, and eventually the entire job will fail.
>>> 
>>> HTH,
>>> 
>>> DR
>>> 
>> 
>> 
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by Harsh J <ha...@cloudera.com>.

Ah so the threshold is job-level, not per task. OK.

One other way I think would be performant, AND still able to use Hadoop itself would be to keep one reducer for this job, and have that reducer check if the counter of total failed records exceeds the threshold or not. A reducer is guaranteed to have gotten the total aggregate of map side counters since it begins only after all maps complete. The reducer can then go ahead and fail itself to fail the job or pass through. Your maps may output their data directly - the reducer is just to decide if the mappers were alright (Perhaps send failed counts as KV to the reducer, to avoid looking up Hadoop counters from within tasks -- but this would easily apply only to Map-only jobs. For MR jobs, it may be a bit more complicated to add this in, but surely still doable with some partitioner and comparator tweaks).

But, also good to fail if a single map task itself exceeds > 10. The above is to ensure the global check, while doing this as well would ensure faster failure depending on the situation.

On 16-Nov-2011, at 1:16 AM, Mapred Learn wrote:

> Hi Harsh,
>  
> My situation is to kill a job when this threshold is reached. If say threshold is 10. And 2 mappers combined reached this value, how should I achieve this.
>  
> With what you are saying, I think job will fail once a single mapper reaches that threshold.
>  
> Thanks,
> 
>  
> On Tue, Nov 15, 2011 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:
> Mapred,
> 
> If you fail a task permanently upon encountering a bad situation, you basically end up failing the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from 4 default total attempts), you can also have it fail the job faster.
> 
> Is killing the job immediately a necessity? Why?
> 
> I s'pose you could call kill from within the mapper, but I've never seen that as necessary in any situation so far. Whats wrong with letting the job auto-die as a result of a failing task?
> 
> On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:
> 
>> Thanks David for a step-by-step response but this makes error threshold, a per mapper threshold. Is there a way to make it per job so that all mappers share this value and increment it as a shared counter ?
>> 
>>  
>> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net> wrote:
>> On 11/14/2011 06:06 PM, Mapred Learn wrote:
>> Hi,
>> 
>> I have a use  case where I want to pass a threshold value to a map-reduce
>> job. For eg: error records=10.
>> 
>> I want map-reduce job to fail if total count of error_records in the job
>> i.e. all mappers, is reached.
>> 
>> How can I implement this considering that each mapper would be processing
>> some part of the input data ?
>> 
>> Thanks,
>> -JJ
>> 
>> 1) Pass in the threshold value as configuration value of the M/R job. (i.e., job.getConfiguration().setInt("error_threshold", 10) )
>> 
>> 2) Make your mappers implement the Configurable interface.  This will ensure that every mapper gets passed a copy of the config object.
>> 
>> 3) When you implement the setConf() method in your mapper (which Configurable will force you to do), retrieve the threshold value from the config and save it in an instance variable in the mapper.  (i.e., int errorThreshold = conf.getInt("error_threshold") )
>> 
>> 4) In the mapper, when an error record occurs, increment a counter and then check if the counter value exceeds the threshold.  If so, throw an exception.  (e.g., if (++numErrors >= errorThreshold) throw new RuntimeException("Too many errors") )
>> 
>> The exception will kill the mapper.  Hadoop will attempt to re-run it, but subsequent attempts will also fail for the same reason, and eventually the entire job will fail.
>> 
>> HTH,
>> 
>> DR
>> 
> 
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by David Rosenstrauch <da...@darose.net>.

I can't think of an easy way to do this.  There's a few not-so-easy 
approaches:

* Implement numErrors as a Hadoop counter, and then have the application 
which submitted the job check the value of that counter once the job is 
complete and have the app throw an error if the counter exceeds the 
threshold.  (Not exactly what you're asking, since this wouldn't cause 
the job to fail in error, but rather you would monitor the job and cause 
your app to fail in error if needed.)

* Store the numErrors counter externally - say in Apache Zookeeper or 
Redis - and have each map task increment the counter and fail the job if 
it exceeds the threshold.  Again, though, some issues to work around 
here:  due to speculative execution, Hadoop might submit extra map 
tasks, so this could throw off the counter.  You'd have to make sure to 
only increment the counter when a map tasks completes successfully.

HTH,

DR

On 11/15/2011 02:46 PM, Mapred Learn wrote:
> Hi Harsh,
>
> My situation is to kill a job when this threshold is reached. If say
> threshold is 10. And 2 mappers combined reached this value, how should I
> achieve this.
>
> With what you are saying, I think job will fail once a single mapper
> reaches that threshold.
>
> Thanks,
>
>
> On Tue, Nov 15, 2011 at 11:22 AM, Harsh J<ha...@cloudera.com>  wrote:
>
>> Mapred,
>>
>> If you fail a task permanently upon encountering a bad situation, you
>> basically end up failing the job as well, automatically. By controlling the
>> number of retries (say down to 1 or 2 from 4 default total attempts), you
>> can also have it fail the job faster.
>>
>> Is killing the job immediately a necessity? Why?
>>
>> I s'pose you could call kill from within the mapper, but I've never seen
>> that as necessary in any situation so far. Whats wrong with letting the job
>> auto-die as a result of a failing task?
>>
>>   On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:
>>
>>   Thanks David for a step-by-step response but this makes error threshold,
>> a per mapper threshold. Is there a way to make it per job so that all
>> mappers share this value and increment it as a shared counter ?
>>
>>
>> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch<da...@darose.net>wrote:
>>
>>>   On 11/14/2011 06:06 PM, Mapred Learn wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a use  case where I want to pass a threshold value to a map-reduce
>>>> job. For eg: error records=10.
>>>>
>>>> I want map-reduce job to fail if total count of error_records in the job
>>>> i.e. all mappers, is reached.
>>>>
>>>> How can I implement this considering that each mapper would be processing
>>>> some part of the input data ?
>>>>
>>>> Thanks,
>>>> -JJ
>>>>
>>>
>>> 1) Pass in the threshold value as configuration value of the M/R job.
>>> (i.e., job.getConfiguration().setInt(**"error_threshold", 10) )
>>>
>>> 2) Make your mappers implement the Configurable interface.  This will
>>> ensure that every mapper gets passed a copy of the config object.
>>>
>>> 3) When you implement the setConf() method in your mapper (which
>>> Configurable will force you to do), retrieve the threshold value from the
>>> config and save it in an instance variable in the mapper.  (i.e., int
>>> errorThreshold = conf.getInt("error_threshold") )
>>>
>>> 4) In the mapper, when an error record occurs, increment a counter and
>>> then check if the counter value exceeds the threshold.  If so, throw an
>>> exception.  (e.g., if (++numErrors>= errorThreshold) throw new
>>> RuntimeException("Too many errors") )
>>>
>>> The exception will kill the mapper.  Hadoop will attempt to re-run it,
>>> but subsequent attempts will also fail for the same reason, and eventually
>>> the entire job will fail.
>>>
>>> HTH,
>>>
>>> DR
>>>
>>
>>
>>
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by Mapred Learn <ma...@gmail.com>.

Hi Harsh,

My situation is to kill a job when this threshold is reached. If say
threshold is 10. And 2 mappers combined reached this value, how should I
achieve this.

With what you are saying, I think job will fail once a single mapper
reaches that threshold.

Thanks,


On Tue, Nov 15, 2011 at 11:22 AM, Harsh J <ha...@cloudera.com> wrote:

> Mapred,
>
> If you fail a task permanently upon encountering a bad situation, you
> basically end up failing the job as well, automatically. By controlling the
> number of retries (say down to 1 or 2 from 4 default total attempts), you
> can also have it fail the job faster.
>
> Is killing the job immediately a necessity? Why?
>
> I s'pose you could call kill from within the mapper, but I've never seen
> that as necessary in any situation so far. Whats wrong with letting the job
> auto-die as a result of a failing task?
>
>  On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:
>
>  Thanks David for a step-by-step response but this makes error threshold,
> a per mapper threshold. Is there a way to make it per job so that all
> mappers share this value and increment it as a shared counter ?
>
>
> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net>wrote:
>
>>  On 11/14/2011 06:06 PM, Mapred Learn wrote:
>>
>>> Hi,
>>>
>>> I have a use  case where I want to pass a threshold value to a map-reduce
>>> job. For eg: error records=10.
>>>
>>> I want map-reduce job to fail if total count of error_records in the job
>>> i.e. all mappers, is reached.
>>>
>>> How can I implement this considering that each mapper would be processing
>>> some part of the input data ?
>>>
>>> Thanks,
>>> -JJ
>>>
>>
>> 1) Pass in the threshold value as configuration value of the M/R job.
>> (i.e., job.getConfiguration().setInt(**"error_threshold", 10) )
>>
>> 2) Make your mappers implement the Configurable interface.  This will
>> ensure that every mapper gets passed a copy of the config object.
>>
>> 3) When you implement the setConf() method in your mapper (which
>> Configurable will force you to do), retrieve the threshold value from the
>> config and save it in an instance variable in the mapper.  (i.e., int
>> errorThreshold = conf.getInt("error_threshold") )
>>
>> 4) In the mapper, when an error record occurs, increment a counter and
>> then check if the counter value exceeds the threshold.  If so, throw an
>> exception.  (e.g., if (++numErrors >= errorThreshold) throw new
>> RuntimeException("Too many errors") )
>>
>> The exception will kill the mapper.  Hadoop will attempt to re-run it,
>> but subsequent attempts will also fail for the same reason, and eventually
>> the entire job will fail.
>>
>> HTH,
>>
>> DR
>>
>
>
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by Harsh J <ha...@cloudera.com>.

Mapred,

If you fail a task permanently upon encountering a bad situation, you basically end up failing the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from 4 default total attempts), you can also have it fail the job faster.

Is killing the job immediately a necessity? Why?

I s'pose you could call kill from within the mapper, but I've never seen that as necessary in any situation so far. Whats wrong with letting the job auto-die as a result of a failing task?

On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:

> Thanks David for a step-by-step response but this makes error threshold, a per mapper threshold. Is there a way to make it per job so that all mappers share this value and increment it as a shared counter ?
> 
>  
> On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net> wrote:
> On 11/14/2011 06:06 PM, Mapred Learn wrote:
> Hi,
> 
> I have a use  case where I want to pass a threshold value to a map-reduce
> job. For eg: error records=10.
> 
> I want map-reduce job to fail if total count of error_records in the job
> i.e. all mappers, is reached.
> 
> How can I implement this considering that each mapper would be processing
> some part of the input data ?
> 
> Thanks,
> -JJ
> 
> 1) Pass in the threshold value as configuration value of the M/R job. (i.e., job.getConfiguration().setInt("error_threshold", 10) )
> 
> 2) Make your mappers implement the Configurable interface.  This will ensure that every mapper gets passed a copy of the config object.
> 
> 3) When you implement the setConf() method in your mapper (which Configurable will force you to do), retrieve the threshold value from the config and save it in an instance variable in the mapper.  (i.e., int errorThreshold = conf.getInt("error_threshold") )
> 
> 4) In the mapper, when an error record occurs, increment a counter and then check if the counter value exceeds the threshold.  If so, throw an exception.  (e.g., if (++numErrors >= errorThreshold) throw new RuntimeException("Too many errors") )
> 
> The exception will kill the mapper.  Hadoop will attempt to re-run it, but subsequent attempts will also fail for the same reason, and eventually the entire job will fail.
> 
> HTH,
> 
> DR
>

Re:Re: how to implement error thresholds in a map-reduce job ?

Posted by rabbit_cheng <ra...@126.com>.

I think David's solution is viable, but don't use a local variable as a counter in step 4, use a COUNTER object to count the error record, the COUNTER object can work globally.



At 2011-11-16 03:08:45,"Mapred Learn" <ma...@gmail.com> wrote:

Thanks David for a step-by-step response but this makes error threshold, a per mapper threshold. Is there a way to make it per job so that all mappers share this value and increment it as a shared counter ?

 
On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net> wrote:

On 11/14/2011 06:06 PM, Mapred Learn wrote:
Hi,

I have a use  case where I want to pass a threshold value to a map-reduce
job. For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job
i.e. all mappers, is reached.

How can I implement this considering that each mapper would be processing
some part of the input data ?

Thanks,
-JJ


1) Pass in the threshold value as configuration value of the M/R job. (i.e., job.getConfiguration().setInt("error_threshold", 10) )

2) Make your mappers implement the Configurable interface.  This will ensure that every mapper gets passed a copy of the config object.

3) When you implement the setConf() method in your mapper (which Configurable will force you to do), retrieve the threshold value from the config and save it in an instance variable in the mapper.  (i.e., int errorThreshold = conf.getInt("error_threshold") )

4) In the mapper, when an error record occurs, increment a counter and then check if the counter value exceeds the threshold.  If so, throw an exception.  (e.g., if (++numErrors >= errorThreshold) throw new RuntimeException("Too many errors") )

The exception will kill the mapper.  Hadoop will attempt to re-run it, but subsequent attempts will also fail for the same reason, and eventually the entire job will fail.

HTH,

DR

Re: how to implement error thresholds in a map-reduce job ?

Posted by Mapred Learn <ma...@gmail.com>.

Thanks David for a step-by-step response but this makes error threshold, a
per mapper threshold. Is there a way to make it per job so that all mappers
share this value and increment it as a shared counter ?


On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <da...@darose.net>wrote:

>  On 11/14/2011 06:06 PM, Mapred Learn wrote:
>
>> Hi,
>>
>> I have a use  case where I want to pass a threshold value to a map-reduce
>> job. For eg: error records=10.
>>
>> I want map-reduce job to fail if total count of error_records in the job
>> i.e. all mappers, is reached.
>>
>> How can I implement this considering that each mapper would be processing
>> some part of the input data ?
>>
>> Thanks,
>> -JJ
>>
>
> 1) Pass in the threshold value as configuration value of the M/R job.
> (i.e., job.getConfiguration().setInt(**"error_threshold", 10) )
>
> 2) Make your mappers implement the Configurable interface.  This will
> ensure that every mapper gets passed a copy of the config object.
>
> 3) When you implement the setConf() method in your mapper (which
> Configurable will force you to do), retrieve the threshold value from the
> config and save it in an instance variable in the mapper.  (i.e., int
> errorThreshold = conf.getInt("error_threshold") )
>
> 4) In the mapper, when an error record occurs, increment a counter and
> then check if the counter value exceeds the threshold.  If so, throw an
> exception.  (e.g., if (++numErrors >= errorThreshold) throw new
> RuntimeException("Too many errors") )
>
> The exception will kill the mapper.  Hadoop will attempt to re-run it, but
> subsequent attempts will also fail for the same reason, and eventually the
> entire job will fail.
>
> HTH,
>
> DR
>

Re: how to implement error thresholds in a map-reduce job ?

Posted by David Rosenstrauch <da...@darose.net>.

On 11/14/2011 06:06 PM, Mapred Learn wrote:
> Hi,
>
> I have a use  case where I want to pass a threshold value to a map-reduce
> job. For eg: error records=10.
>
> I want map-reduce job to fail if total count of error_records in the job
> i.e. all mappers, is reached.
>
> How can I implement this considering that each mapper would be processing
> some part of the input data ?
>
> Thanks,
> -JJ

1) Pass in the threshold value as configuration value of the M/R job. 
(i.e., job.getConfiguration().setInt("error_threshold", 10) )

2) Make your mappers implement the Configurable interface.  This will 
ensure that every mapper gets passed a copy of the config object.

3) When you implement the setConf() method in your mapper (which 
Configurable will force you to do), retrieve the threshold value from 
the config and save it in an instance variable in the mapper.  (i.e., 
int errorThreshold = conf.getInt("error_threshold") )

4) In the mapper, when an error record occurs, increment a counter and 
then check if the counter value exceeds the threshold.  If so, throw an 
exception.  (e.g., if (++numErrors >= errorThreshold) throw new 
RuntimeException("Too many errors") )

The exception will kill the mapper.  Hadoop will attempt to re-run it, 
but subsequent attempts will also fail for the same reason, and 
eventually the entire job will fail.

HTH,

DR