You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by felix gao <gr...@gmail.com> on 2011/01/29 00:51:02 UTC

streaming job in python that reports progress

mighty user group,

I am trying to write a streaming job that does a lot of io in a python
program.  I know if I don't report back every x minutes the job will be
terminated.  How do I report back to the task tracker in my streaming python
job that is in the middle of the gzip for example.

Thanks,

Felix

Re: streaming job in python that reports progress

Posted by Harsh J <qw...@gmail.com>.
Already answered in the Streaming docs:
http://hadoop.apache.org/mapreduce/docs/current/streaming.html#How+do+I+update+status+in+streaming+applications%3F

On Sat, Jan 29, 2011 at 5:21 AM, felix gao <gr...@gmail.com> wrote:
> mighty user group,
> I am trying to write a streaming job that does a lot of io in a python
> program.  I know if I don't report back every x minutes the job will be
> terminated.  How do I report back to the task tracker in my streaming python
> job that is in the middle of the gzip for example.
> Thanks,
> Felix



-- 
Harsh J
www.harshj.com

Re: streaming job in python that reports progress

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Hi Felix,

Two options I can think of

1) Set longer timeouts   -Dmapred.task.timeout=_____  in millisecond.
or
2) Have a separate thread that reports back to TaskTracker with status through writing to stderr
     https://issues.apache.org/jira/browse/HADOOP-1328
     Format:   "reporter:status:____"

Hope it works.

Koji


On 1/28/11 3:51 PM, "felix gao" <gr...@gmail.com> wrote:

mighty user group,

I am trying to write a streaming job that does a lot of io in a python program.  I know if I don't report back every x minutes the job will be terminated.  How do I report back to the task tracker in my streaming python job that is in the middle of the gzip for example.

Thanks,

Felix