You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2008/11/24 22:57:44 UTC

[jira] Created: (HADOOP-4718) incrementing counters should not be used for triggering record skipping

incrementing counters should not be used for triggering record skipping
-----------------------------------------------------------------------

                 Key: HADOOP-4718
                 URL: https://issues.apache.org/jira/browse/HADOOP-4718
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Owen O'Malley


The following code is really problematic:

{code}
public void incrCounter(String group, String counter, long amount) {
  if (counters != null) {
    counters.incrCounter(group, counter, amount);
  }
  if(skipping && SkipBadRecords.COUNTER_GROUP.equals(group) && (
     SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS.equals(counter) ||
     SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS.equals(counter))) {
     //if application reports the processed records, move the 
     //currentRecStartIndex to the next.
     //currentRecStartIndex is the start index which has not yet been 
     //finished and is still in task's stomach.
     for(int i=0;i<amount;i++) {
        currentRecStartIndex = currentRecIndexIterator.next();
     }
   ...
}
{code}

In particular, if the user updates a counter with the wrong name, bad things will presumably happen...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4718) incrementing counters should not be used for triggering record skipping

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652767#action_12652767 ] 

Sharad Agarwal commented on HADOOP-4718:
----------------------------------------

Skipping bad records feature need a way to get a callback for the number of processed records from streaming process. To support this, counters were chosen as that is supported by both pipes and streaming ->https://issues.apache.org/jira/browse/HADOOP-153?focusedCommentId=12610897#action_12610897 (last point)

bq. In particular, if the user updates a counter with the wrong name, bad things will presumably happen...
I see this can only happen if user defines its own counter with the same name. Or is there any other problem which can happen? would it be ok for now to document the framework reserve counter names and perhaps log in the above loop that framework counter is being updated ?

Other alternative if we don't want to use counter for this at all, would be to add a mechanism in streaming and pipes protocol. Streaming can write to stderr something like processedRecords, which would be parsed by the framework. Similarly need to be added to Pipes protocol as well.





> incrementing counters should not be used for triggering record skipping
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4718
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4718
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>
> The following code is really problematic:
> {code}
> public void incrCounter(String group, String counter, long amount) {
>   if (counters != null) {
>     counters.incrCounter(group, counter, amount);
>   }
>   if(skipping && SkipBadRecords.COUNTER_GROUP.equals(group) && (
>      SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS.equals(counter) ||
>      SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS.equals(counter))) {
>      //if application reports the processed records, move the 
>      //currentRecStartIndex to the next.
>      //currentRecStartIndex is the start index which has not yet been 
>      //finished and is still in task's stomach.
>      for(int i=0;i<amount;i++) {
>         currentRecStartIndex = currentRecIndexIterator.next();
>      }
>    ...
> }
> {code}
> In particular, if the user updates a counter with the wrong name, bad things will presumably happen...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.