You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org> on 2009/10/07 10:28:31 UTC

[jira] Created: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Progress reported for pipes tasks is incorrect.
-----------------------------------------------

                 Key: MAPREDUCE-1073
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: pipes
            Reporter: Sreekanth Ramakrishnan


Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
{code}
        while (input.next(key, value)) {
          downlink.mapItem(key, value);
          if(skipping) {
            downlink.flush();
          }
        }
{code}

This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854287#action_12854287 ] 

Hadoop QA commented on MAPREDUCE-1073:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
  against trunk revision 931274.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/96/console

This message is automatically generated.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891328#action_12891328 ] 

Dick King commented on MAPREDUCE-1073:
--------------------------------------

In my previous comment I should have said that this patch addresses BOTH points, and is complete modulo a forward port.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1073:
-------------------------------------

    Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852295#action_12852295 ] 

Hadoop QA commented on MAPREDUCE-1073:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440406/mapreduce-1073--2010-03-31.patch
  against trunk revision 929712.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/83/console

This message is automatically generated.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1073:
-------------------------------------

    Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837962#action_12837962 ] 

Arun C Murthy commented on MAPREDUCE-1073:
------------------------------------------

Forgot to thank Christian for the patch!

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762934#action_12762934 ] 

Sreekanth Ramakrishnan commented on MAPREDUCE-1073:
---------------------------------------------------

The implication of the incorrect progress affect scheduling of speculative tasks for the pipes jobs. As progress reported for all the pipes task would be 100%

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Status: Open  (was: Patch Available)

Removed this patch to replace it with another patch that tests its functionality.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869959#action_12869959 ] 

Hadoop QA commented on MAPREDUCE-1073:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
  against trunk revision 946955.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/console

This message is automatically generated.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1073:
-------------------------------------

    Attachment: MAPREDUCE-1073_yhadoop20.patch

Adding a 'setProgress' api for pipes applications.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1073:
-------------------------------------

    Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879245#action_12879245 ] 

Hadoop QA commented on MAPREDUCE-1073:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch
  against trunk revision 955068.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 8 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/572/console

This message is automatically generated.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1073:
-------------------------------------

    Status: Open  (was: Patch Available)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Attachment: mapreduce-1073--2010-04-06.patch

This patch is as large as it is because it includes the removal of {{src/examples/pipes/aclocal.m4}} .  That file is a derived file that should not be included in the code base.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1073:
-------------------------------------

    Status: Open  (was: Patch Available)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Attachment: mapreduce-1073--2010-03-31.patch

I've checked that the patch marks progress continuously, if your code uses it and it has a way to figure the progress [which is not always available].

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Attachment: MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch

I revised the patch to not add an API to read and set set the property that tells MapTask.TrackedRecordReader to not record
progress as it reads the input; just read and set the property "by hand" in the code.  Since this is a
pipes-specific feature, it should be handled only by a focused attribute, which I then renamed to
mapred.pipes.disable.record.reader.progress .

In https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891327&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891327 , the API for marking a job as having mappers who will indicate their own progress is now to just set the property, which I have renamed from {{mapred.job.disable.record.reader.progress}} to {{mapred.pipes.disable.record.reader.progress}} , because this is a pipes-only concept.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch, MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Status: Patch Available  (was: Open)

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>         Attachments: mapreduce-1073--2010-03-31.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1073:
---------------------------------

    Attachment: MAPREDUCE-1073--yhadoop20--2010-07-22.patch

The previous versions of this attachment missed one point.

The basic problem is that with the existing code base the progress is based on the records read from the input split, but there is buffering in the way pipes works.  This makes the tasks appear to have made more progress than they deserve to have made, in jobs where the input splits are small.

To make speculation work under pipes with small input splits, two conditions have to be met:

1: The pipes code has to have an API to report progress, and has to use it.  The old patch met this goal.  You incant {{(&context)->serProgress(float)}} within {{HadoopPipes::Mapper.map(HadoopPipes::MapContext& context)}} .  This does require that you have a way of measuring progress,which I consider likely because this is only needed when the input splits are small, which implies that the "input data" is really a signal to get the real data somewhere else [or to generate it].

2: The job has to be able to say that the progress that would otherwise be inferred from input split reads has to be ignored.  This newest version of the patch does that; you can either call {{JobConf.setRecordReaderProgressDisabled(true)}}, or set the attribute {{mapred.job.disable.record.reader.progress}} to {{true}} .

This patch addresses the second point.  I did not mark it available because it needs a forward port.  I attached it to this issue for comments, and for the record.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1073:
-------------------------------------

      Status: Open  (was: Patch Available)
    Assignee: Dick King

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated MAPREDUCE-1073:
----------------------------------------------

    Affects Version/s: 0.20.1

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

Posted by "Dick King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892543#action_12892543 ] 

Dick King commented on MAPREDUCE-1073:
--------------------------------------

I would like to invite community comment on the approach of https://issues.apache.org/jira/secure/attachment/12450229/MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch which is described in https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891371&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891371 before I do any forward port.

> Progress reported for pipes tasks is incorrect.
> -----------------------------------------------
>
>                 Key: MAPREDUCE-1073
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: pipes
>    Affects Versions: 0.20.1
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Dick King
>         Attachments: mapreduce-1073--2010-03-31.patch, mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch, MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader<K1, V1>, OutputCollector<K2, V2>, Reporter)}} we do the following:
> {code}
>         while (input.next(key, value)) {
>           downlink.mapItem(key, value);
>           if(skipping) {
>             downlink.flush();
>           }
>         }
> {code}
> This would result in consumption of all the records for current task and taking task progress to 100% whereas the actual pipes application would be trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.