You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Brian MacKay <Br...@MEDecision.com> on 2008/11/07 21:12:45 UTC

Dynamically terminate a job once Reporter hits a threshold

Looking for a way to dynamically terminate a job once Reporter in a Map
job hits a threshold,

Example:  
 
public void map(WritableComparable key, Text values, Output
Collector<Text, Text> output, Reporter reporter) throws IOException {

     if( reporter.getCount()  > SomeConfigValue) {
                return;
            }
          ....  map job code
        }
        
Obviously,  reporter.getCount() doesn't exist.  Open to other ideas, and
any advice would be appreciated.
Thanks, Brian

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.

Re: reduce more than one way

Posted by Owen O'Malley <om...@apache.org>.

On Nov 7, 2008, at 12:35 PM, Elia Mazzawi wrote:

> I have 2 hadooop map/reduce programs that have the same map, but a  
> different reduce methods.
>
> can i run them in a way so that the map only happens once?

If the input to the reduces is the same, you can put the two reduces  
together and use one of the multiple output libraries. That will let  
your reducer produce two different output directories.

-- Owen

Re: reduce more than one way

Posted by Amar Kamat <am...@yahoo-inc.com>.

Elia Mazzawi wrote:
> Hello,
>
> I'm writing hadoop programs in Java,
> I have 2 hadooop map/reduce programs that have the same map, but a 
> different reduce methods. 
Look how MultipleOutputFormat is used. This provides the facility to 
write to multiple files.
Amar
>
> can i run them in a way so that the map only happens once?
>
> maybe store the intermediate result or something?

Re: reduce more than one way

Posted by lohit <lo...@yahoo.com>.

There is mapper called IdentityMapper (look of IdentityMapper.java), which basically reads input and outputs without doing anything.
May be you can run your mapper with no reducers and store intermediate output and then run your 2 hadoop programs with Identity mapper and different set of reducers.
Thanks,
Lohit

----- Original Message ----
From: Elia Mazzawi <el...@casalemedia.com>
To: core-user@hadoop.apache.org
Sent: Friday, November 7, 2008 12:35:44 PM
Subject: reduce more than one way

Hello,

I'm writing hadoop programs in Java,
I have 2 hadooop map/reduce programs that have the same map, but a different reduce methods.

can i run them in a way so that the map only happens once?

maybe store the intermediate result or something?

Re: reduce more than one way

Posted by Miles Osborne <mi...@inf.ed.ac.uk>.

why not just merge the two reducer

2008/11/7 Elia Mazzawi <el...@casalemedia.com>:
> Hello,
>
> I'm writing hadoop programs in Java,
> I have 2 hadooop map/reduce programs that have the same map, but a different
> reduce methods.
>
> can i run them in a way so that the map only happens once?
>
> maybe store the intermediate result or something?
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

reduce more than one way

Posted by Elia Mazzawi <el...@casalemedia.com>.

Hello,

I'm writing hadoop programs in Java,
I have 2 hadooop map/reduce programs that have the same map, but a 
different reduce methods.

can i run them in a way so that the map only happens once?

maybe store the intermediate result or something?

RE: Dynamically terminate a job once Reporter hits a threshold

Posted by Brian MacKay <Br...@MEDecision.com>.

Thanks Arun for your tip.

This morning I changed to submitJob and polled.  It worked very well,
and you saved me some trial and error.

-----Original Message-----
From: Aaron Kimball [mailto:aaron@cloudera.com] 
Sent: Monday, November 10, 2008 4:35 AM
To: core-user@hadoop.apache.org
Subject: Re: Dynamically terminate a job once Reporter hits a threshold

Out of curiosity, how reliable are the counters from the perspective of
the
JobClient while the job is in progress? While hitting 'refresh' on the
status web page for a job, I notice that my counters bounce all over the
place, showing wildly different figures second-to-second. Is that using
a
different (less well-synchronized?) mechanism to access the counters
than
the user has available in the JobClient? (If so, is this something we
can
easily patch to make more consistent?)

- Aaron

On Fri, Nov 7, 2008 at 12:21 PM, Arun C Murthy <ac...@yahoo-inc.com>
wrote:

>
> On Nov 7, 2008, at 12:12 PM, Brian MacKay wrote:
>
>
>> Looking for a way to dynamically terminate a job once Reporter in a
Map
>> job hits a threshold,
>>
>> Example:
>>
>> public void map(WritableComparable key, Text values, Output
>> Collector<Text, Text> output, Reporter reporter) throws IOException {
>>
>>    if( reporter.getCount()  > SomeConfigValue) {
>>               return;
>>           }
>>         ....  map job code
>>       }
>>
>> Obviously,  reporter.getCount() doesn't exist.  Open to other ideas,
and
>> any advice would be appreciated.
>>
>
> If you _really_ need this, you could do this from your JobClient...
use
> JobClient.submitJob (rather than runJob:
>
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job+Subm
ission+and+Monitoring),
> manually fetch the Counters you need and terminate the Job via
> JobClient.getJob(jobId).killJob().
>
> Arun
>
>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.

Re: Dynamically terminate a job once Reporter hits a threshold

Posted by Aaron Kimball <aa...@cloudera.com>.

Out of curiosity, how reliable are the counters from the perspective of the
JobClient while the job is in progress? While hitting 'refresh' on the
status web page for a job, I notice that my counters bounce all over the
place, showing wildly different figures second-to-second. Is that using a
different (less well-synchronized?) mechanism to access the counters than
the user has available in the JobClient? (If so, is this something we can
easily patch to make more consistent?)

- Aaron

On Fri, Nov 7, 2008 at 12:21 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

>
> On Nov 7, 2008, at 12:12 PM, Brian MacKay wrote:
>
>
>> Looking for a way to dynamically terminate a job once Reporter in a Map
>> job hits a threshold,
>>
>> Example:
>>
>> public void map(WritableComparable key, Text values, Output
>> Collector<Text, Text> output, Reporter reporter) throws IOException {
>>
>>    if( reporter.getCount()  > SomeConfigValue) {
>>               return;
>>           }
>>         ....  map job code
>>       }
>>
>> Obviously,  reporter.getCount() doesn't exist.  Open to other ideas, and
>> any advice would be appreciated.
>>
>
> If you _really_ need this, you could do this from your JobClient... use
> JobClient.submitJob (rather than runJob:
> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job+Submission+and+Monitoring),
> manually fetch the Counters you need and terminate the Job via
> JobClient.getJob(jobId).killJob().
>
> Arun
>
>

Re: Dynamically terminate a job once Reporter hits a threshold

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

On Nov 7, 2008, at 12:12 PM, Brian MacKay wrote:

>
> Looking for a way to dynamically terminate a job once Reporter in a  
> Map
> job hits a threshold,
>
> Example:
>
> public void map(WritableComparable key, Text values, Output
> Collector<Text, Text> output, Reporter reporter) throws IOException {
>
>     if( reporter.getCount()  > SomeConfigValue) {
>                return;
>            }
>          ....  map job code
>        }
>
> Obviously,  reporter.getCount() doesn't exist.  Open to other ideas,  
> and
> any advice would be appreciated.

If you _really_ need this, you could do this from your JobClient...  
use JobClient.submitJob (rather than runJob: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job+Submission+and+Monitoring) 
, manually fetch the Counters you need and terminate the Job via  
JobClient.getJob(jobId).killJob().

Arun