You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2010/08/07 01:05:16 UTC

[jira] Created: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Rumen is not able to extract counters for Job history logs from Hadoop 0.20
---------------------------------------------------------------------------

                 Key: MAPREDUCE-2000
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Hong Tang


Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896819#action_12896819 ] 

Hong Tang commented on MAPREDUCE-2000:
--------------------------------------

I uploaded a new patch that addresses Amar's comments.

bq.  1.  Can you please add some comments as to what the regex is supposed to do? Comments for each of the capturing groups w.r.t what are they planning to compare/match themselves against would be good enough.
Comments added.

bq.   2. Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems similar to me.
Yes, they are based on the regex from yhadoop 20. However, this is not available in trunk. So I have to copy it over.

bq.   3. In the testcase, you could define your values in unescaped format and use StringUtils to escape it. 
Used StringUtils's escapeString and unescapeString directly. I also added a few more unit tests that should be covering all cases as included in the testcase you wrote.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Attachment: mr-2000-20100810.patch

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896857#action_12896857 ] 

Ranjit Mathew commented on MAPREDUCE-2000:
------------------------------------------

I like new version of the patch, especially the way you explain the regex.

Just a few minor comments:

  1. There's redundant "return" statement introduced by your patch.

  2. I would personally prefer a new-line before the "/**" lines to improve readability a bit.

  3. Do you think it makes sense to include the actual input-line above that triggered the bug in the unit test-case?

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Attachment: mr-2000-20100806.patch

Patch that fixed the bug. The added unit test fails without the patch.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Status: Patch Available  (was: Open)

test-patch passed on my local machine.
{noformat}
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 2 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang reassigned MAPREDUCE-2000:
------------------------------------

    Assignee: Hong Tang

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Attachment: mr-2000-20100809.patch

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897094#action_12897094 ] 

Hong Tang commented on MAPREDUCE-2000:
--------------------------------------

bq. 1. There's redundant "return" statement introduced by your patch.
Will do.

bq. 2. I would personally prefer a new-line before the "/**" lines to improve readability a bit.
This is probably a matter of different style. I prefer to use blank lines to separate logically different groups.

bq. 3. Do you think it makes sense to include the actual input-line above that triggered the bug in the unit test-case?
We should verify the patch to make sure it fixes the real bug. But I think that unit tests should test minimally what it is supposed to test. In other words, I think including actual input that causes the bug is a bad idea. It would simply obscure what the unit test is supposed to cover.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926052#action_12926052 ] 

Hudson commented on MAPREDUCE-2000:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/])
    

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>             Fix For: 0.22.0
>
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810-yhadoop-20.1xx.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Attachment: mr-2000-20100810-yhadoop-20.1xx.patch

Patch for yahoo hadoop 20.1xx branch. Not to be committed.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>             Fix For: 0.22.0
>
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810-yhadoop-20.1xx.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896510#action_12896510 ] 

Amar Kamat commented on MAPREDUCE-2000:
---------------------------------------

Using the regex makes it much cleaner. +1 for using regex. 

Few comments.
# Can you please add some comments as to what the regex is supposed to do? Comments for each of the capturing groups w.r.t what are they planning to compare/match themselves against would be good enough.
# Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems similar to me.
# In the testcase, you could define your values in unescaped format and use {{StringUtils}} to escape it. This is how the framework does it. So here is how the testcase might look like

{code}
line-type=// something
key=k1
value=val1 // special char content in unescaped format
line=line-type + space + key + equals + quotes + StringUtils.escape(value) + quotes + line-delim
ParsedLine pl = new ParsedLine(line, version)

// assert
newValue = pl.get(key)
unEscapeValue = StringUtils.unescape(newValue)
assertEquals(value, unEscapedValue)
{code}

See a sample testcase [here|http://pastebin.com/2Y19v29S].

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Status: Open  (was: Patch Available)

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Status: Open  (was: Patch Available)

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Status: Patch Available  (was: Open)

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-2000:
-------------------------------------

           Status: Resolved  (was: Patch Available)
     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.22.0
       Resolution: Fixed

+1

I committed this. Thanks, Hong!

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>             Fix For: 0.22.0
>
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-2000:
---------------------------------

    Status: Patch Available  (was: Open)

New patch that addresses Ranjit's comments.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897127#action_12897127 ] 

Ranjit Mathew commented on MAPREDUCE-2000:
------------------------------------------

+1 from me. Thanks for taking this up.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch, mr-2000-20100810.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ranjit Mathew updated MAPREDUCE-2000:
-------------------------------------

    Component/s: tools/rumen

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896184#action_12896184 ] 

Hong Tang commented on MAPREDUCE-2000:
--------------------------------------

An example input line that triggers this bug:
{noformat}
MapAttempt TASK_TYPE="MAP" TASKID="task_201005120512_181225_m_000266" TASK_ATTEMPT_ID="attempt_201005120512_181225_m_000266_0" 
TASK_STATUS="SUCCESS" FINISH_TIME="1275354731626" STATE_STRING="qry:\"RFC 3584\" Value: [Simple][timestamp:1275276122][type:c][/Simple]" COUNTERS="
{(FileSystemCounters)(FileSystemCounters)[(FILE_BYTES_READ)(FILE_BYTES_READ)(1609)][(HDFS_BYTES_READ)(HDFS_BYTES_READ)(67412713)]
[(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(5648633)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(COMBINE_OUTPUT_RECORDS)
(Combine output records)(0)][(MAP_INPUT_RECORDS)(Map input records)(92297)][(SPILLED_RECORDS)(Spilled Records)(74370)][(MAP_OUTPUT_BYTES)(Map output 
bytes)(18737847)][(MAP_INPUT_BYTES)(Map input bytes)(67211804)][(COMBINE_INPUT_RECORDS)(Combine input records)(0)][(MAP_OUTPUT_RECORDS)(Map output 
records)(74370)]}" .
{noformat}

It outputs the following error messages:
{noformat}
10/08/06 23:14:40 WARN rumen.HistoryEventEmitter: HistoryEventEmitters: null counter detected:
10/08/06 23:14:40 WARN rumen.HistoryEventEmitter: HistoryEventEmitters: null counter detected:
...
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(String.java:1938)
        at org.apache.hadoop.tools.rumen.ParsedLine.<init>(ParsedLine.java:100)
        at org.apache.hadoop.tools.rumen.Hadoop20JHParser.nextEvent(Hadoop20JHParser.java:131)
        at org.apache.hadoop.tools.rumen.TraceBuilder.processJobHistory(TraceBuilder.java:287)
        at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:242)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:120)
{noformat}

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Hong Tang
>
> Rumen tries to match the end of a value string through indexOf("\""). It does not take into account the case when an escaped '"' in the value string. This leads to the incorrect parsing the remaining key=value properties in the same line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.