You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2008/08/19 11:45:44 UTC

[jira] Created: (HADOOP-3970) Counters written to the job history cannot be recovered back

Counters written to the job history cannot be recovered back
------------------------------------------------------------

                 Key: HADOOP-3970
                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Amar Kamat


Counters that are written to the history are stringified using {{Counters.getCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that the groupname and the countername can contain '.' and hence recovering the counter back becomes difficult. Since jobhistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from the string representation and also to keep the stringified version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630940#action_12630940 ] 

Amar Kamat commented on HADOOP-3970:
------------------------------------

Suhas,
Here we have introduced 2 new api's namely {{Counters.makeEscapedCompactString() and Counters.fromEscapedCompactString()}}. What history uses is {{Counters.makeCompactString()}} and there is no mechanism to recover it into an object. If you look closely, the counter string in the job history is ambiguous and hence recovery becomes problematic. {{Counters.makeEscapedCompactString()}} represents the counters in an unambiguous way and will be used by HADOOP-3245. 

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3970:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.19.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Amar!

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Attachment: HADOOP-3970-v1.patch

Attaching a patch the implements the above mentioned technique. Tested it with HADOOP-3245 and it works fine. I am yet to test it separately. 

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629397#action_12629397 ] 

Amar Kamat commented on HADOOP-3970:
------------------------------------

Core test failed on TestDatanodeDeath. Not sure if its related to this patch or Counters. Ran the same test on my machine and it passed.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Attachment: HADOOP-3970-v2.patch

Attaching a patch that adds the facility to recover the counter from (encoded)string. The format in which the counters are represented is 
{code}
{(gn1) (gd1) [(cn1) (cd1) (cv1)] [] [] []} {}
Where {} are used for groups, [] are used for counter and () for basic values
{code}
Th change w.r.t Hemanth's comment is in the encapsulation for counter from '{ }' to '[ ]' which makes parsing and the recovery code simple. Added testcases that test counters. Comments? Thoughts?

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629119#action_12629119 ] 

Amareshwari Sriramadasu commented on HADOOP-3970:
-------------------------------------------------

bq. I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape.
+1 I agree with Hemant to not deprecate the api escapeString(String str, char escapeChar, char charToEscape).

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624635#action_12624635 ] 

Amar Kamat commented on HADOOP-3970:
------------------------------------

It seems that the jobhistory uses the display names instead of actual names in the stringified representation.

One such example of actual name and display name is 
_actual-name_ = {{HDFS_WRITE}} and _display-name_ = {{HDFS bytes written}}. 

Note that the mapping from the actual name to the display name is mentioned in the _.properties_ file.

Looks like its difficult to get the (actual) name from the display name because of the following reasons
1)  Display names can be same across counters. 
2) There are two ways to add/increment a counter 
 - pass the _group_, _counter_ and the _value_
 - pass the (enum) _key_ and the _value_. For the (enum) key, the name is the classname for that (enum) _key_.
The display name is obtained from the _property_ file. In case there is no property file corresponding to the enum, the actual name is used. In either case, figuring out the actual name seems difficult.

So, I suggest we also add the actual name to jobhistory along with the display name and provide api to set the display-name of the counter/group.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Description: Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.  (was: Counters that are written to the history are stringified using {{Counters.getCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that the groupname and the countername can contain '.' and hence recovering the counter back becomes difficult. Since jobhistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from the string representation and also to keep the stringified version readable.)

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629395#action_12629395 ] 

Hadoop QA commented on HADOOP-3970:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389722/HADOOP-3970-v4.1.patch
  against trunk revision 693048.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3213/console

This message is automatically generated.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626865#action_12626865 ] 

Hemanth Yamijala commented on HADOOP-3970:
------------------------------------------

Amar and I thought about this a bit. Groups also have internal and display names, apparently. Since space is a character that may be more common in display names (we have examples in the current list of counters), it may be better to do something like:
{code}
{(gi1)(gd1){(ci1)(cd1)(v1)}}
{code}. 

Real example: 
{code}
{(org.apache.hadoop.mapred.Task$FileSystemCounter)(File Systems){(HDFS_READ)(HDFS bytes read)(1020634)}}
{code} 
as opposed to 
{code}
{org.apache.hadoop.mapred.Task$FileSystemCounter File\ Systems {HDFS_READ HDFS\ bytes\ read 1020634}}
{code}

IMHO, the first is a bit more readable, and clear where the tokens are ending. Comments ?

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Attachment: HADOOP-3970-v3.patch

Uploading a new patch the fixes the findbugs warnings.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628557#action_12628557 ] 

Amar Kamat commented on HADOOP-3970:
------------------------------------

bq. equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. ....
The code to check content equality of counters is moved to the test case as only the test case requires it for now.
bq. getBlock: We expect a start and end. If there's no start or no end, should we fail fast ?
Now {{getBlock()}} throws ParseException if the string is malformed and returns null is no block found



> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3970:
------------------------------------

    Release Note: Added getEscapedCompactString() and fromEscapedCompactString() to Counters.java to represent counters as Strings and to reconstruct the counters from the Strings.  (was: Adds getEscapedCompactString() and fromEscapedCompactString() apis to Counters.java . These apis provide the facility to represent counters as string and reconstruct the counter back from the string.)

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628439#action_12628439 ] 

Hemanth Yamijala commented on HADOOP-3970:
------------------------------------------

StringUtils

- I think the {{escapeString(String str, char escapeChar, char charToEscape)}} API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?
- charHasPreEscape is not needed. It is always going to be the escape char. This is a bug in the previous code too.

Counters

- The token delimiters can be declared final.

Counter

- equals and hashCode implementation: Since the counter's value can change over time, I am thinking that these methods should not depend on the value of the counter. Needs to be thought out a bit more.
- makeEscapedCompactString - use StringBuffer, instead of appending spaces.
- makeEscapedCompactString - do we need the spaces ? I think the expression is compact without them. If we need them, then a space is missing when the counter closes.

Group
- equals: Code is incorrectly returning true if classes don't match. 
- hashCode: Are there standard ways in which we can combine hashCodes of various implementations. Also, the hashCode should not use the toString method to get the hashCode.
makeCompactEscapedString - names are different between similar API in Counter. Can we have them uniform ? Also, you can use StringBuffer without any String concatanation at all. Same comment on blank spaces.

Counters:
- getBlock: We expect a start and end. If there's no start or no end, should we fail fast ? I guess today if the end does not come as expected, it returns the remaining String as the value.
- Same comments here as with the Group's implementation for overriding equals and hashcode.

TestCounters:

- testCounter need not check the mainCounter and copyCounter's equality before getting them from the strings. In fact, why do we need copyCounter at all ?


> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Release Note: Adds getEscapedCompactString() and fromEscapedCompactString() apis to Counters.java . These apis provide the facility to represent counters as string and reconstruct the counter back from the string.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626852#action_12626852 ] 

Owen O'Malley commented on HADOOP-3970:
---------------------------------------

After an out-of-band discussion with Amar, I agree that we need both the internal and display names to be saved. If we are breaking the format, we should probably pull out the group names and put in both. Maybe we should use a lisp like format:

{code}
{g1 {ci1 cd1 v1} {ci2 cd2 v2} {ci3 cd3 v3}} {g2 {ci4 cd4 v4}}
{code}

Thoughts?

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Suhas Gogate (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630706#action_12630706 ] 

Suhas Gogate commented on HADOOP-3970:
--------------------------------------

Amar, patch applied through JIRA provides the function "Counters.fromEscapedCompactString(String compactString)".  I was planing to use it to parse the counters printed in job history log file,  but looks like job history logfile does NOT have the counters string representation in a escaped compact string format. Is it going to be changed? Which JIRA would address this issue (HADOOP-3200 ?).  Pl. comment.    

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623776#action_12623776 ] 

Amar Kamat commented on HADOOP-3970:
------------------------------------

Currently the way counters are converted to string (in jobhistory) is as follows
_groupname.countername:value_
So for a counter which has the following contents
||groupname||countername||value||
|g1|c1|v1|
|g1|c2|v2|
|g1|c3|v3|
|g2|c4|v4|
|g2|c5|v5|
We get 
{{g1.c1:v1, g1.c2:v2, g1.c3:v3, g2.c4:v4, g2.c5:v5}}

One way to overcome the problem stated above is to use the length of the names present in the counter
So the above  counter might look like
{{[|g1|] g1 {[|c1|] c1:v1, [|c2|] c2:v2, [|c3|] c3:v3 }, [|g2|] g2 {[|c4|] c4:v4, [|c5|] c5:v5 }}}
Here |s| means length of s.
Hence the length of the name in the counter helps to correctly identify the name and helps in counter recovery.
----
Thoughts?

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625783#action_12625783 ] 

Owen O'Malley commented on HADOOP-3970:
---------------------------------------

Let's just keep the display names. The internal names are for mostly for mapping to the counters. Yes, in theory the display names could collide, but in practice they won't, because they are always displayed to the user by the display name. 

I think that putting in the lengths would make it very hard to read. I think it is better to stay with in the current scheme and use the current:

{code}
g1.c1: v1, g1.c2:v2, ...
{code}

As you point out, that means we need to quote the characters . : , and probably " and \.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644637#action_12644637 ] 

Matei Zaharia commented on HADOOP-3970:
---------------------------------------

Will this be a standardized format for counters in future Hadoop releases, or are there other issues you guys think will need to be fixed? I've been working on some scripts for parsing job history logs to compute statistics about how a Hadoop cluster is being used (see https://issues.apache.org/jira/browse/HADOOP-3708) and I've seen small things being different in different versions of Hadoop. I think it would be beneficial to choose a format and make it standard for Hadoop 1.0, because I imagine others are also interested in being able to parse out info from the job history logs.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Assignee: Amar Kamat
      Status: Patch Available  (was: Open)

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630187#action_12630187 ] 

Hudson commented on HADOOP-3970:
--------------------------------

Integrated in Hadoop-trunk #600 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/600/])

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Attachment: HADOOP-3970-v4.patch

Attaching a patch the incorporates Hemanth's comments.
bq.  I think the escapeString(String str, char escapeChar, char charToEscape) API is still useful by itself, and is easier to use in cases where the user has only one char to escape. So, do we really need to deprecate this ?
The reason for deprecating the api is that I have introduced a new/enhanced api that essentially does the same thing but is more generic. I personally dont have any strong opinion about this.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3970) Counters written to the job history cannot be recovered back

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-3970:
-------------------------------

    Attachment: HADOOP-3970-v4.1.patch

Attaching a patch that removes the deprecated tag. As rightly pointed out by Hemanth, I have not optimized the code for performance. Most of the changes in {{StringUtils.java}} are due to refactoring. The apis added to {{Counters}} are kept simple. Will check if it can be easily optimized.

> Counters written to the job history cannot be recovered back
> ------------------------------------------------------------
>
>                 Key: HADOOP-3970
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3970
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>         Attachments: HADOOP-3970-v1.patch, HADOOP-3970-v2.patch, HADOOP-3970-v3.patch, HADOOP-3970-v4.1.patch, HADOOP-3970-v4.patch
>
>
> Counters that are written to the JobHistory are stringified using {{Counters.makeCompactString()}}. The format in which this api converts the counter into a string is _groupname.countername:value_. The problem is that _groupname_ and _countername_ can contain a '.' and hence recovering the counter becomes difficult. Since JobHistory can be used for various purposes, reconstructing the counter object back might be useful. One such usecase is HADOOP-3245. There should be some way to recover the counter object back from its string representation and also to keep the string version readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.