You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Dick King (JIRA)" <ji...@apache.org> on 2009/12/18 00:26:18 UTC
[jira] Created: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
-----------------------------------------------------------------------------------------------------------------------------
Key: MAPREDUCE-1309
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Dick King
There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Tang updated MAPREDUCE-1309:
---------------------------------
Attachment: mr-1309-yhadoop-20.10.patch
patch for yahoo hadoop 20.10. not to be committed.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tools/rumen
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, mr-1309-yhadoop-20.10.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
this patch is incompatible with another patch. See the patch reintroduction comment.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-03.patch
This is the new patches. The main changes are new test cases on small components of rumen, changing mainclass to TraceBuilder
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798280#action_12798280 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429802/demuxer-plus-concatenated-files--2010-01-08-c.patch
against trunk revision 897118.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/258/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835092#action_12835092 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12436139/mapreduce-1309--2010-02-17.patch
against trunk revision 911191.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/460/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
Resubmitting the same patch a second time in the hopes that Hudson will notice it this time.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829823#action_12829823 ]
Dick King commented on MAPREDUCE-1309:
--------------------------------------
I am submitting a new patch.
Some of Hudson's points are well taken: I have cleaned up the javac and findbugs and release audit warnings.
The javadoc warnings are in some module I haven't touched. I'll fix them under MAPREDUCE-1459 .
The three test failures are in org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter.test* . They appear to have come from a .zip failure, which I understand is a recurrent problem, not a test that actually runs and fails. Here is one example:
{noformat}
Error Message
java.util.zip.ZipException: error reading zip file
Stacktrace
java.lang.RuntimeException: java.util.zip.ZipException: error reading zip file
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1715)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1529)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1475)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:564)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1892)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:347)
at org.apache.hadoop.mapred.HadoopTestCase.setUp(HadoopTestCase.java:145)
at org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter.setUp(TestJobOutputCommitter.java:59)
Caused by: java.util.zip.ZipException: error reading zip file
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1200(ZipFile.java:29)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:447)
at java.util.zip.ZipFile$1.fill(ZipFile.java:230)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream.read(XMLEntityManager.java:2910)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:704)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:186)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:225)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1628)
{noformat}
see http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/testReport/org.apache.hadoop.mapreduce.lib.output/TestJobOutputCommitter/testCustomAbort/ .
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2009-01-14-a.patch
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-08.patch
I made a couple of redundant changes to Trunk, in 1295 and here. I removed them.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793460#action_12793460 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12428670/demuxer-plus-concatenated-files--2009-12-21.patch
against trunk revision 893021.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The patch appears to cause tar ant target to fail.
-1 findbugs. The patch appears to cause Findbugs to fail.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/329/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834486#action_12834486 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12436030/mapreduce-1309--2010-02-16.patch
against trunk revision 910465.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The patch appears to cause tar ant target to fail.
-1 findbugs. The patch appears to cause Findbugs to fail.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/457/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amar Kamat updated MAPREDUCE-1309:
----------------------------------
Component/s: tools/rumen
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tools/rumen
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
fix a bug introduced in some other patch that broke my test case. Unless this happened again, it should work this time.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798868#action_12798868 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429912/demuxer-plus-concatenated-files--2010-01-11.patch
against trunk revision 897118.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
-1 release audit. The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/373/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-16-a.patch
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835346#action_12835346 ]
Hudson commented on MAPREDUCE-1309:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk-Commit #247 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/247/])
. Refactor Rumen trace generator to improve code structure
and add extensible support for log formats. Contributed by Dick King
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
Hudson hasn't run on this in over a day. I cancelled this and will resubmit it to give Hudson another kick
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-08-b.patch
fix dropped import in a test case
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
I've gotten a code review and I've incorporated some suggestions
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833278#action_12833278 ]
Hong Tang commented on MAPREDUCE-1309:
--------------------------------------
The latest patch (2010-02-12) looks good. Only a few minor comments below:
- incorrect changes in javadoc comments for mapred.FileInputFormat.setInputPaths and mapreduce.lib.input.FileInputFormat.setInputPaths.
- in TestRumenJobTraces.java, change "path.makeQualified(fs)" to "path.makeQualified(fs.getUri(), fs.getWorkingDirectory())" to avoid explicit suppression of warnings.
- DefaultInputDemuxer does not guard against misuse such as demuxer.bindTo(...); demuxer.close(); demuxer.getNext(). Would be better to put a final block for close to reset name and input to be null.
- unused import junit.Ignore in HadoopLogAnalyzer.java
- HEE.nameNames() should be removed.
- processReduceAttemptFinishedEvent and processMapAttemptFinishedEvent contains a commented line as follows: "// attempt.setLocation(???);". Is it a placeholder for some TODO work? If so, please fill in more comments.
- should the following be removed from JobBuilder.processJobFinishedEvent()? "// ???? result.setOutcome(event.)"
- MapAttempt20LineHistoryEventEmitter.makePrototype does not seem to be used.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797877#action_12797877 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429609/demuxer-plus-concatenated-files--2010-01-06.patch
against trunk revision 897076.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/363/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King reassigned MAPREDUCE-1309:
------------------------------------
Assignee: Dick King
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-08-d.patch
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798135#action_12798135 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429765/demuxer-plus-concatenated-files--2010-01-08.patch
against trunk revision 897118.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The patch appears to cause tar ant target to fail.
-1 findbugs. The patch appears to cause Findbugs to fail.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/367/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
Cancel this patch to make room for a new one.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798874#action_12798874 ]
Dick King commented on MAPREDUCE-1309:
--------------------------------------
I can certify that:
* None of the bad unit tests are the result of this patch
* The release audit failure is on files that cannot be bannered -- test cases
* The compiler warnings are deprecated APIs that are pervasively used and will require considerable effort to remove here and elsewhere.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841435#action_12841435 ]
Hudson commented on MAPREDUCE-1309:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk #248 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-08-c.patch
This replacement patch fixes the problems noted by Hudson. Please note that I request the following variances:
* The javac warnings refer to deprecated interfaces that will require a lot of study to remove, and are being used ubiquitously. Counters, mostly.
* The release audit refers to files containing test cases, which are in .json and cannot receive a banner headline.
* The failed core tests include other subsystems which I believe are known failures.
I did fix the other problems.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835309#action_12835309 ]
Hong Tang commented on MAPREDUCE-1309:
--------------------------------------
The latest patch looks good. +1.
The javac warning seems to be due to the use of deprecated API (which is unavoidable). And the failed core/contrib tests seem to be irrelevant to this patch.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830241#action_12830241 ]
Dick King commented on MAPREDUCE-1309:
--------------------------------------
The new javac warnings are uses of deprecated API in the test cases. It's necessary at this time.
The javadoc warnings look like this:
{noformat}
[exec] [javadoc] Building tree for all the packages and classes...
[exec] [javadoc] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapred
/FileInputFormat.java:323: warning - @param argument "inputs" is not a parameter name.
[exec] [javadoc] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/src/java/org/apache/hadoop/mapreduce/lib/input
/FileInputFormat.java:356: warning - @param argument "inputs" is not a parameter name.
{noformat}
[I changed the formatting on the console printout excerpt to make it render better]
These are warnings that have nothing to do with this patch.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
I discovered a bug.
I expect to have a fixed version of this patch in place by about 10AM PST 1/14 .
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798307#action_12798307 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429804/demuxer-plus-concatenated-files--2010-01-08-d.patch
against trunk revision 897118.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
-1 release audit. The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/259/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated MAPREDUCE-1309:
-------------------------------------
Status: Open (was: Patch Available)
Unfortunately, the patch does not compile against trunk (related to MAPREDUCE-1016?):
{noformat}
[exec] compile-tools:
[exec] [javac] Compiling 69 source files to /grid/0/hudson/hudson-slave/workspace/\
Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/build/tools
[exec] [javac] /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/\
src/tools/org/apache/hadoop/tools/rumen/LoggedTask.java:178: cannot find symbol
[exec] [javac] symbol : class Counters
[exec] [javac] location: interface org.apache.hadoop.mapreduce.jobhistory.Events
[exec] [javac] private void incorporateMapCounters(Events.Counters counters) {
{noformat}
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-04.patch
Replacement patch --- see previous comment
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
We discovered a corner case that generates a null pointer exception.
I wrote a simple fix. I will withdraw this patch, and provide a new one that incrporats that fix.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829929#action_12829929 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12434880/mapreduce-1309--2010-02-04.patch
against trunk revision 906228.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/436/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829905#action_12829905 ]
Dick King commented on MAPREDUCE-1309:
--------------------------------------
We've changed the mainclass to TraceBuilder.
We've also added a new command line switch, {{-demuxer classname}} . If you code this parameter, it must implement {{InputDemuxer}} . This allows the user to write custom code to produce multiple virtual files from each real file [for example, the real files might be tasr files or harchives .
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated MAPREDUCE-1309:
-------------------------------------
Resolution: Fixed
Fix Version/s: 0.22.0
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
+1
I committed this. Thanks, Dick!
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833172#action_12833172 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435711/mapreduce-1309--2010-02-12.patch
against trunk revision 909340.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/449/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
There were a couple of changes in Trunk from one of my other patches [1295] while I was making the patch.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834583#action_12834583 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12436044/mapreduce-1309--2010-02-16-a.patch
against trunk revision 910465.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/458/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Douglas updated MAPREDUCE-1309:
-------------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-10.patch
created the read loop described above
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834639#action_12834639 ]
Hong Tang commented on MAPREDUCE-1309:
--------------------------------------
The change to file input split javadoc is still present in the latest patch, which causes the javadoc warning.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-16.patch
This patch is a response to Hong's comments.
Indeed they were minor, and the fixes are very simple.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829538#action_12829538 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12434751/mapreduce-1309--2010-02-03.patch
against trunk revision 906228.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
-1 findbugs. The patch appears to introduce 12 new Findbugs warnings.
-1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 0 warnings).
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/433/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-17.patch
got rid of two javadocs errors, and a couple of unused fields in LoggedTask
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-06.patch
This fixes the compilationn problem resulting from the change in the jobhistory Events interface.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832642#action_12832642 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435485/mapreduce-1309--2010-02-10.patch
against trunk revision 908321.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 17 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-01-20.patch
This patch file reflects the small changes suggested.
None of them rises to the level of a major change.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
I made a few cosmetic changes based on a review
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803202#action_12803202 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12430939/mapreduce-1309--2010-01-20.patch
against trunk revision 901350.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated 1 warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
-1 release audit. The applied patch generated 4 release audit warnings (more than the trunk's current 0 warnings).
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/280/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800337#action_12800337 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12430268/mapreduce-1309--2009-01-14.patch
against trunk revision 899211.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
-1 release audit. The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/386/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2009-01-14.patch
fixed a bug that the launch time and finish time were getting confused with each other
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: mapreduce-1309--2010-02-12.patch
This fixes a null pointer exception in TraceBuilder.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Patch Available (was: Open)
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
There was a problem that I discovered in a bulk test.
The main change in the patch is
{noformat}
input.mark(bufferSize + 1);
int actualRead = input.read(buffer);
+ int mostRecentRead = actualRead;
+
+ while (actualRead < bufferSize && mostRecentRead > 0) {
+ mostRecentRead =
+ input.read(buffer, actualRead, bufferSize - actualRead);
+
+ if (mostRecentRead > 0) {
+ actualRead += mostRecentRead;
+ }
+ }
if (actualRead < markerBytes.length) {
input.reset();
{noformat}
{{BufferedInputStream.read(byte[])}} does NOT read as much as possible as I expected. It seems to stop at disk block boundaries [but a new read will steam on].
This patch clears this problem and only this problem, and is extremely unlikely to introduce new ones.
-dk
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798229#action_12798229 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429775/demuxer-plus-concatenated-files--2010-01-08-b.patch
against trunk revision 897118.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
-1 findbugs. The patch appears to introduce 4 new Findbugs warnings.
-1 release audit. The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/368/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Tang updated MAPREDUCE-1309:
---------------------------------
Attachment: rumen-yhadoop-20.patch
Backport to hadoop 20.1xx branch. Not to be committed.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798102#action_12798102 ]
Dick King commented on MAPREDUCE-1309:
--------------------------------------
more precisely, I removed them from this patch, because they're already in Trunk. Don't worry, I did NOT _reverse_ them anywhere.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Status: Open (was: Patch Available)
fix dropped import
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2009-12-21.patch
This patch implements a universal gridmix3/mumak trace generator.
It differs from previous versions of rumen in three ways:
1: The mainclass is o.a.h.tools.rumen.Driver
2: This tool is specialized to make traces. Future statisticsengines will be trace-based
3: The argument list is more austere. There are three or more arguments:
3a: the trace output, a {{Path}} , compressed or not
3b: the topology output, again a {{Path}} , again compressed or not
3c: any number of {{Path}} names, each of which can be compressed or not, and each of which can be a config.xml file, a job tracker log [ {{Driver}} determines the version ], or a directory filled with such files.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Dick King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dick King updated MAPREDUCE-1309:
---------------------------------
Attachment: demuxer-plus-concatenated-files--2010-01-11.patch
Line 143 of LoggedTaskAttempt.java was changed to read
this.hostName = hostName.intern();
which introduces a bug which breaks my test case. It fails when you read a null hostName in a json string.
I added the fix to this patch instead of making a separate patch for that issue.
this.hostName = (hostName == null ? null :hostName.intern());
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job
trace generator to use a more modular internal structure, to allow for more
input log formats
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800534#action_12800534 ]
Hadoop QA commented on MAPREDUCE-1309:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12430309/mapreduce-1309--2009-01-14-a.patch
against trunk revision 899501.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 9 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
-1 javac. The applied patch generated 2342 javac compiler warnings (more than the trunk's current 2330 warnings).
+1 findbugs. The patch does not introduce any new Findbugs warnings.
-1 release audit. The applied patch generated 166 release audit warnings (more than the trunk's current 161 warnings).
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/387/console
This message is automatically generated.
> I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange].
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.