You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Milind Bhandarkar (JIRA)" <ji...@apache.org> on 2006/06/22 22:15:29 UTC

[jira] Created: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Progress in writing a DFS file does not count towards Job progress and can make the task timeout
------------------------------------------------------------------------------------------------

         Key: HADOOP-318
         URL: http://issues.apache.org/jira/browse/HADOOP-318
     Project: Hadoop
        Type: Bug

  Components: mapred  
    Versions: 0.3.2    
 Environment: all, but especially on big busy clusters
    Reporter: Milind Bhandarkar
 Assigned to: Milind Bhandarkar 
     Fix For: 0.4.0


When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12417527 ] 

Owen O'Malley commented on HADOOP-318:
--------------------------------------

The only way around it that I can see is if we had:

FSDataOutputStream:
    setProgressable(Progressable prog)

RecordWriter:
   setProgressable(Progressable prog)

which just pushes the problem down a level.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12417370 ] 

Doug Cutting commented on HADOOP-318:
-------------------------------------

This looks good, except it is not back-compatible.  Any user code that implements an OutputFormat will no longer compile after this change is made.  Sigh.  I don't see an easy way around this...

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418277 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

I will submit the correct patch again.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418097 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

Sure. I will provide a patch for nutch tomorrow.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-datanode-allocation.patch, hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12417539 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

I don't see how this can be done without breaking backward-compatibility. Therefore I have made changes so that with minimum porting any other output formats could be incorporated. An additional getRecordWriter method needs to be implemented that takes an additional parameter. This parameter can be passed to fs.create (or even ignored as in the case of local filesystem.)


> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-latency-latest.patch

Doug. I have removed the extra getRecordWriter method, added progressable documentation, and param doc, and here is the patch. I have also uploaded corresponding patch to nutch-312.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]
     
Doug Cutting resolved HADOOP-318:
---------------------------------

    Resolution: Fixed

I just committed this.  Thanks, Milind.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment:     (was: hadoop-latency-latest.patch)

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment:     (was: hadoop-datanode-allocation.patch)

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-datanode-allocation.patch

This is an updated patch for this issue that does not have any errors "task reported no progress for 600 seconds" even if there is progress. In fact it is a datanode allocation patch. Each datanode sends an additional load data to namenode that indicates how many bllocks it is currently writing or reading. The namenode, when choosing datanodes for new block takes this load into consideration, and discards datanodes whose load is more than twice that of average.

Thiss is in addition to the requirement that the datanode has enough space to store min_num_blocks.

With this patch, I never see the "no progress for 600 seconds, killing task" error. Therefore, on my 240 node cluster, the randomwriter times went down from 3997 seconds to 2404 seconds.

This patch includes the file-writing progress patch as well. So, please discard  the two patches I submitted earlier.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-datanode-allocation.patch, hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418307 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

Did not even think about it earlier. But  if compiles, go ahead.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418086 ] 

Doug Cutting commented on HADOOP-318:
-------------------------------------

Sigh, I also don't see a compatible way to make this change.  So we'll have to upgrade some Nutch InputFormat implementations to define the new method.  Could you please construct a patch for Nutch too?  That would make my life easier.  Thanks.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-datanode-allocation.patch, hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418249 ] 

Doug Cutting commented on HADOOP-318:
-------------------------------------

Sigh.  Now, with HADOOP-321 reverted, this no longer applies.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-latency.patch

Patch for the dfs file-writing progress reporting.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418126 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

I have attached the patch for nutch to NUTCH-312.
Affter the recent commmits, there are conflicts (espcecially in datanodeInfo etc). I am resolving those issues, and will provide  a new patch for this issue soon. In the meanwhile, I am deleting the currrently attached patches.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-latency-latest.patch

this is the correct patch after resolving issues arising from conflicts due to recent commits.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment:     (was: hadoop-latency.patch)

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-latency-new.patch

After svn update, the earlier patch failed to compile.
I have attached a new patch now.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-new.patch, hadoop-latency.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418304 ] 

Doug Cutting commented on HADOOP-318:
-------------------------------------

Any reason not to delete the old getRecordWriter() method?

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-318?page=comments#action_12418302 ] 

Milind Bhandarkar commented on HADOOP-318:
------------------------------------------

I meant reverting hadoop-321, not hadoop-320.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment:     (was: hadoop-latency-new.patch)

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment: hadoop-latency-latest.patch

After factoring out the changes introduced by recalling patch for hadoop-320, I have not attached a new patch for this bug. Hopefully this is the last patch.

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0
>  Attachments: hadoop-latency-latest.patch
>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-318) Progress in writing a DFS file does not count towards Job progress and can make the task timeout

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-318?page=all ]

Milind Bhandarkar updated HADOOP-318:
-------------------------------------

    Attachment:     (was: hadoop-latency-latest.patch)

> Progress in writing a DFS file does not count towards Job progress and can make the task timeout
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-318
>          URL: http://issues.apache.org/jira/browse/HADOOP-318
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.3.2
>  Environment: all, but especially on big busy clusters
>     Reporter: Milind Bhandarkar
>     Assignee: Milind Bhandarkar
>      Fix For: 0.4.0

>
> When a task writes to DFS file, depending on how busy the cluster is, it can timeout after 10 minutes by default, because the progress towards writing a DFS file does not count as progress of the task. The solution (patch is forthcoming) is to provide a way to callback reporter to report task progress from DFSOutputStream.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira