You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by felix gao <gr...@gmail.com> on 2011/01/29 00:51:02 UTC
streaming job in python that reports progress
mighty user group,
I am trying to write a streaming job that does a lot of io in a python
program. I know if I don't report back every x minutes the job will be
terminated. How do I report back to the task tracker in my streaming python
job that is in the middle of the gzip for example.
Thanks,
Felix
Re: streaming job in python that reports progress
Posted by Harsh J <qw...@gmail.com>.
Already answered in the Streaming docs:
http://hadoop.apache.org/mapreduce/docs/current/streaming.html#How+do+I+update+status+in+streaming+applications%3F
On Sat, Jan 29, 2011 at 5:21 AM, felix gao <gr...@gmail.com> wrote:
> mighty user group,
> I am trying to write a streaming job that does a lot of io in a python
> program. I know if I don't report back every x minutes the job will be
> terminated. How do I report back to the task tracker in my streaming python
> job that is in the middle of the gzip for example.
> Thanks,
> Felix
--
Harsh J
www.harshj.com
Re: streaming job in python that reports progress
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Hi Felix,
Two options I can think of
1) Set longer timeouts -Dmapred.task.timeout=_____ in millisecond.
or
2) Have a separate thread that reports back to TaskTracker with status through writing to stderr
https://issues.apache.org/jira/browse/HADOOP-1328
Format: "reporter:status:____"
Hope it works.
Koji
On 1/28/11 3:51 PM, "felix gao" <gr...@gmail.com> wrote:
mighty user group,
I am trying to write a streaming job that does a lot of io in a python program. I know if I don't report back every x minutes the job will be terminated. How do I report back to the task tracker in my streaming python job that is in the middle of the gzip for example.
Thanks,
Felix