You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Dick King (JIRA)" <ji...@apache.org> on 2009/12/18 00:26:18 UTC

[jira] Created: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
-----------------------------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-1309
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Dick King


There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-1309:
---------------------------------

    Attachment: mr-1309-yhadoop-20.10.patch

patch for yahoo hadoop 20.10. not to be committed.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.21.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, mr-1309-yhadoop-20.10.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

this patch is incompatible with another patch.  See the patch reintroduction comment.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-03.patch

This is the new patches.  The main changes are new test cases on small components of rumen, changing mainclass to TraceBuilder

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798280#action_12798280 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429802/demuxer-plus-concatenated-files--2010-01-08-c.patch
  against trunk revision 897118.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/258/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835092#action_12835092 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436139/mapreduce-1309--2010-02-17.patch
  against trunk revision 911191.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

Resubmitting the same patch a second time in the hopes that Hudson will notice it this time.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829823#action_12829823 ] 

Dick King commented on MAPREDUCE-1309:
--------------------------------------

I am submitting a new patch.

Some of Hudson's points are well taken:  I have cleaned up the javac and findbugs and release audit warnings.

The javadoc warnings are in some module I haven't touched.   I'll fix them under MAPREDUCE-1459 .

The three test failures are in org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter.test*  .  They appear to have come from a .zip failure, which I understand is a recurrent problem, not a test that actually runs and fails.  Here is one example:

{noformat}
Error Message

java.util.zip.ZipException: error reading zip file
Stacktrace

java.lang.RuntimeException: java.util.zip.ZipException: error reading zip file
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1715)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1529)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1475)
	at org.apache.hadoop.conf.Configuration.get(Configuration.java:564)
	at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1892)
	at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:347)
	at org.apache.hadoop.mapred.HadoopTestCase.setUp(HadoopTestCase.java:145)
	at org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter.setUp(TestJobOutputCommitter.java:59)
Caused by: java.util.zip.ZipException: error reading zip file
	at java.util.zip.ZipFile.read(Native Method)
	at java.util.zip.ZipFile.access$1200(ZipFile.java:29)
	at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:447)
	at java.util.zip.ZipFile$1.fill(ZipFile.java:230)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105)
	at java.io.FilterInputStream.read(FilterInputStream.java:66)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2910)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:704)
	at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:186)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:225)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
	at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1628)
{noformat}

see http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/testReport/org.apache.hadoop.mapreduce.lib.output/TestJobOutputCommitter/testCustomAbort/ .

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2009-01-14-a.patch

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-08.patch

I made a couple of redundant changes to Trunk, in 1295 and here.  I removed them.


> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793460#action_12793460 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428670/demuxer-plus-concatenated-files--2009-12-21.patch
  against trunk revision 893021.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834486#action_12834486 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436030/mapreduce-1309--2010-02-16.patch
  against trunk revision 910465.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1309:
----------------------------------

    Component/s: tools/rumen

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.21.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

fix a bug introduced in some other patch that broke my test case.  Unless this happened again, it should work this time.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798868#action_12798868 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429912/demuxer-plus-concatenated-files--2010-01-11.patch
  against trunk revision 897118.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-16-a.patch

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835346#action_12835346 ] 

Hudson commented on MAPREDUCE-1309:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #247 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/247/])
    . Refactor Rumen trace generator to improve code structure
and add extensible support for log formats. Contributed by Dick King


> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.22.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

Hudson hasn't run on this in over a day.  I cancelled this and will resubmit it to give Hudson another kick

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-08-b.patch

fix dropped import in a test case

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

I've gotten a code review and I've incorporated some suggestions

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833278#action_12833278 ] 

Hong Tang commented on MAPREDUCE-1309:
--------------------------------------

The latest patch (2010-02-12) looks good. Only a few minor comments below:

- incorrect changes in javadoc comments for mapred.FileInputFormat.setInputPaths and mapreduce.lib.input.FileInputFormat.setInputPaths.
- in TestRumenJobTraces.java, change "path.makeQualified(fs)" to "path.makeQualified(fs.getUri(), fs.getWorkingDirectory())" to avoid explicit suppression of warnings.
- DefaultInputDemuxer does not guard against misuse such as demuxer.bindTo(...); demuxer.close(); demuxer.getNext(). Would be better to put a final block for close to reset name and input to be null.
- unused import junit.Ignore in HadoopLogAnalyzer.java
- HEE.nameNames() should be removed.
- processReduceAttemptFinishedEvent and processMapAttemptFinishedEvent contains a commented line as follows: "// attempt.setLocation(???);". Is it a placeholder for some TODO work? If so, please fill in more comments.
- should the following be removed from JobBuilder.processJobFinishedEvent()? "// ???? result.setOutcome(event.)"
- MapAttempt20LineHistoryEventEmitter.makePrototype does not seem to be used.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797877#action_12797877 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429609/demuxer-plus-concatenated-files--2010-01-06.patch
  against trunk revision 897076.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/363/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King reassigned MAPREDUCE-1309:
------------------------------------

    Assignee: Dick King

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-08-d.patch

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798135#action_12798135 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429765/demuxer-plus-concatenated-files--2010-01-08.patch
  against trunk revision 897118.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

Cancel this patch to make room for a new one.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798874#action_12798874 ] 

Dick King commented on MAPREDUCE-1309:
--------------------------------------

I can certify that:

   * None of the bad unit tests are the result of this patch
   * The release audit failure is on files that cannot be bannered -- test cases
   * The compiler warnings are deprecated APIs that are pervasively used and will require considerable effort to remove here and elsewhere.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841435#action_12841435 ] 

Hudson commented on MAPREDUCE-1309:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #248 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])
    

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.22.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-08-c.patch

This replacement patch fixes the problems noted by Hudson.  Please note that I request the following variances:

   * The javac warnings refer to deprecated interfaces that will require a lot of study to remove, and are being used ubiquitously.  Counters, mostly.
   * The release audit refers to files containing test cases, which are in .json and cannot receive a banner headline.
   * The failed core tests include other subsystems which I believe are known failures.

I did fix the other problems.


> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835309#action_12835309 ] 

Hong Tang commented on MAPREDUCE-1309:
--------------------------------------

The latest patch looks good. +1.

The javac warning seems to be due to the use of deprecated API (which is unavoidable). And the failed core/contrib tests seem to be irrelevant to this patch.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830241#action_12830241 ] 

Dick King commented on MAPREDUCE-1309:
--------------------------------------

The new javac warnings are uses of deprecated API in the test cases.  It's necessary at this time.

The javadoc warnings look like this:

{noformat}

     [exec]   [javadoc] Building tree for all the packages and classes...
     [exec]   [javadoc] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapred
                            /FileInputFormat.java:323: warning - @param argument "inputs" is not a parameter name.
     [exec]   [javadoc] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/lib/input
                            /FileInputFormat.java:356: warning - @param argument "inputs" is not a parameter name.

{noformat}

[I changed the formatting on the console printout excerpt to make it render better]

These are warnings that have nothing to do with this patch.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

I discovered a bug.

I expect to have a fixed version of this patch in place by about 10AM PST 1/14 .

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798307#action_12798307 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429804/demuxer-plus-concatenated-files--2010-01-08-d.patch
  against trunk revision 897118.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1309:
-------------------------------------

    Status: Open  (was: Patch Available)

Unfortunately, the patch does not compile against trunk (related to MAPREDUCE-1016?):
{noformat}
     [exec] compile-tools:
     [exec]     [javac] Compiling 69 source files to /grid/0/hudson/hudson-slave/workspace/\
     Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/build/tools
     [exec]     [javac] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/\
     src/tools/org/apache/hadoop/tools/rumen/LoggedTask.java:178: cannot find symbol
     [exec]     [javac] symbol  : class Counters
     [exec]     [javac] location: interface org.apache.hadoop.mapreduce.jobhistory.Events
     [exec]     [javac]   private void incorporateMapCounters(Events.Counters counters) {
{noformat}

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-04.patch

Replacement patch --- see previous comment

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

We discovered a corner case that generates a null pointer exception.

I wrote a simple fix.  I will withdraw this patch, and provide a new one that incrporats that fix.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829929#action_12829929 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434880/mapreduce-1309--2010-02-04.patch
  against trunk revision 906228.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829905#action_12829905 ] 

Dick King commented on MAPREDUCE-1309:
--------------------------------------

We've changed the mainclass to TraceBuilder.

We've also added a new command line switch, {{-demuxer classname}} .  If you code this parameter, it must implement {{InputDemuxer}} .  This allows the user to write custom code to produce multiple virtual files from each real file [for example, the real files might be tasr files or harchives .

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1309:
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.22.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Dick!

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.22.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833172#action_12833172 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435711/mapreduce-1309--2010-02-12.patch
  against trunk revision 909340.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

There were a couple of changes in Trunk from one of my other patches [1295] while I was making the patch. 

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834583#action_12834583 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436044/mapreduce-1309--2010-02-16-a.patch
  against trunk revision 910465.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1309:
-------------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-10.patch

created the read loop described above

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834639#action_12834639 ] 

Hong Tang commented on MAPREDUCE-1309:
--------------------------------------

The change to file input split javadoc is still present in the latest patch, which causes the javadoc warning.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-16.patch

This patch is a response to Hong's comments.  

Indeed they were minor, and the fixes are very simple.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829538#action_12829538 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434751/mapreduce-1309--2010-02-03.patch
  against trunk revision 906228.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    -1 findbugs.  The patch appears to introduce 12 new Findbugs warnings.

    -1 release audit.  The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-17.patch

got rid of two javadocs errors, and a couple of unused fields in LoggedTask

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-06.patch

This fixes the compilationn problem resulting from the change in the jobhistory Events interface.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832642#action_12832642 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435485/mapreduce-1309--2010-02-10.patch
  against trunk revision 908321.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 17 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-01-20.patch

This patch file reflects the small changes suggested.

None of them rises to the level of a major change.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

I made a few cosmetic changes based on a review

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803202#action_12803202 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430939/mapreduce-1309--2010-01-20.patch
  against trunk revision 901350.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800337#action_12800337 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430268/mapreduce-1309--2009-01-14.patch
  against trunk revision 899211.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2009-01-14.patch

fixed a bug that the launch time and finish time were getting confused with each other

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: mapreduce-1309--2010-02-12.patch

This fixes a null pointer exception in TraceBuilder.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

There was a problem that I discovered in a bulk test.

The main change in the patch is

{noformat}
       input.mark(bufferSize + 1);
 
       int actualRead = input.read(buffer);
+      int mostRecentRead = actualRead;
+
+      while (actualRead < bufferSize && mostRecentRead > 0) {
+        mostRecentRead =
+            input.read(buffer, actualRead, bufferSize - actualRead);
+
+        if (mostRecentRead > 0) {
+          actualRead += mostRecentRead;
+        }
+      }
 
       if (actualRead < markerBytes.length) {
         input.reset();
{noformat}

{{BufferedInputStream.read(byte[])}} does NOT read as much as possible as I expected.  It seems to stop at disk block boundaries [but a new read will steam on].

This patch clears this problem and only this problem, and is extremely unlikely to introduce new ones.

-dk


> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798229#action_12798229 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429775/demuxer-plus-concatenated-files--2010-01-08-b.patch
  against trunk revision 897118.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    -1 findbugs.  The patch appears to introduce 4 new Findbugs warnings.

    -1 release audit.  The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Tang updated MAPREDUCE-1309:
---------------------------------

    Attachment: rumen-yhadoop-20.patch

Backport to hadoop 20.1xx branch. Not to be committed.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>             Fix For: 0.21.0
>
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798102#action_12798102 ] 

Dick King commented on MAPREDUCE-1309:
--------------------------------------

more precisely, I removed them from this patch, because they're already in Trunk.  Don't worry, I did NOT _reverse_ them anywhere.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

fix dropped import

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2009-12-21.patch

This patch implements a universal gridmix3/mumak trace generator.

It differs from previous versions of rumen in three ways:

1: The mainclass is o.a.h.tools.rumen.Driver

2: This tool is specialized to make traces.  Future statisticsengines will be trace-based

3: The argument list is more austere.  There are three or more arguments:

3a: the trace output, a {{Path}} , compressed or not

3b: the topology output, again a {{Path}} , again compressed or not

3c: any number of {{Path}} names, each of which can be compressed or not, and each of which can be a config.xml file, a job tracker log [ {{Driver}} determines the version ], or a directory filled with such files.



> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Dick King (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dick King updated MAPREDUCE-1309:
---------------------------------

    Attachment: demuxer-plus-concatenated-files--2010-01-11.patch

Line 143 of LoggedTaskAttempt.java was changed to read

   this.hostName = hostName.intern();

which introduces a bug which breaks my test case.  It fails when you read a null hostName in a json string.

I added the fix to this patch instead of making a separate patch for that issue.

   this.hostName = (hostName == null ? null :hostName.intern());


> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800534#action_12800534 ] 

Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430309/mapreduce-1309--2009-01-14-a.patch
  against trunk revision 899501.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/console

This message is automatically generated.

> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated?  The existing rumen only has a couple of answers to this question.  The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.