You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "arkady borkovsky (JIRA)" <ji...@apache.org> on 2007/06/08 03:08:26 UTC

[jira] Updated: (HADOOP-1477) Streaming should allow to re-start the command if it failed in the middle of input

     [ https://issues.apache.org/jira/browse/HADOOP-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

arkady borkovsky updated HADOOP-1477:
-------------------------------------

    Description: 
Sometimes, we need to use imperfect programs to process data.

Recently, I used a public domain program that did what I needed, but crashed after processing few million records (in my case, more than half of the mappers would succeed, with the rest failing at different %%).

It would be nice to be able to tell the Streaming Framework :

     if the streaming command fails at some input record (and you get "pipe broken" from it), 
     restart the command and continue feeding it the data.
     Please log the failing record.

In textmining, quite often, loosing few record of the input makes no  difference at all.
Of course this feature should be disabled by default, and should some "are really sure" provision.  (an expert feature).


  was:
Sometime, we need to use imperfect programs to process data.

Recently, I used a public domain program that does what I need, but crashes after processing few million records (in my cases, more than half of the mappers would succeed, with the rest failing at different %%).

It would be nice if it was possible to tell the Streaming Framework :

     if the streaming command fails at some input record (and you get "pipe broken" from it), 
     restart the command and continue feeding it the data.
     Please log the failing record.

In datamining, quite often, loosing few record of the input makes no  difference at all.
Of course this feature should be disabled by default, and should some "are really sure" provision.  (an expert feature).


        Summary: Streaming should allow to re-start the command if it failed in the middle of input  (was: Stream should allow to re-start the command if it failed in the middle of input)

> Streaming should allow to re-start the command if it failed in the middle of input
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1477
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1477
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> Sometimes, we need to use imperfect programs to process data.
> Recently, I used a public domain program that did what I needed, but crashed after processing few million records (in my case, more than half of the mappers would succeed, with the rest failing at different %%).
> It would be nice to be able to tell the Streaming Framework :
>      if the streaming command fails at some input record (and you get "pipe broken" from it), 
>      restart the command and continue feeding it the data.
>      Please log the failing record.
> In textmining, quite often, loosing few record of the input makes no  difference at all.
> Of course this feature should be disabled by default, and should some "are really sure" provision.  (an expert feature).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.