You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2010/07/06 20:11:49 UTC

[jira] Created: (MAPREDUCE-1918) Add documentation to Rumen

Add documentation to Rumen
--------------------------

                 Key: MAPREDUCE-1918
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tools/rumen
    Affects Versions: 0.22.0
            Reporter: Amar Kamat
            Assignee: Amar Kamat
             Fix For: 0.22.0


Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905998#action_12905998 ] 

Hong Tang commented on MAPREDUCE-1918:
--------------------------------------

A few minor nits:
* "Incase" => "in case"
* For TraceBuilder, does it descend recursively into the input foloder, or do we need to specify the immediate parent directory that contains the files? 
* Can we add a bit more details on "demuxer"? How about the following?
bq. Demuxer decides how the input file maps to jobhistory file(s). [insert]Job history logs and job conf files are typically small files, and can be more effectively stored if we embed them in some container file format like SequenceFile or TFile. To support such usage cases, one can specify a customized Demuxer class that can extract individual job history logs and job conf files from source files. [/insert]
* There is no need to do canParse() check if you know which parser to use (hence no need to use ris). The parser will (or should) simply abort if the source is not of the expected version.
* VersionDetector seems rather internal, getParser() is probably what users should care about.



> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890795#action_12890795 ] 

Hong Tang commented on MAPREDUCE-1918:
--------------------------------------

I think we should also describe (1) the Json objects are created through Jackson ObjectMapper from LoggedXXX classes; (2)  the API interface how to build LoggedXXX objects, and how to read them.

The basic API flow for creating parsed rumen object is as follows (user's responsibility of creating input streams from job conf xml and job history logs):
- JobConfigurationParser: parser that parses job conf xml. One instance can be reused to parse many job conf xml files.
{code}
	JobConfigurationParser jcp = new JobConfigurationParser(interestedProperties); // interestedProperties is a a list of keys to be extracted from the job conf xml file.
	Properties parsedProperties = jcp.parse(inputStream); // inputStream is the file input stream for the job conf xml file.
{code}
	
- JobHistoryParser: parser that parses job history files. It is an interface and actual implementations are defined as enums in JobHistoryParserFactory. One can directly use the version matching the the version of job history logs. Or she can also use method "canParse()" to detect which parser is suitable for parsing the job history logs (following the pattern in TraceBuilder). Create one instance to parse a job history log and close it after use.
{code}
	JobHistoryParser parser = new Hadoop20JHParser(inputStream); // inputStream is the file input stream for the job history file.
	// JobHistoryParser APIs will be used later when being fed into JobBuilder (below).
	parser.close();
{code}

- JobBuilder: builder for LoggedJobs. Create one instance to parse the pairing job history log and job conf. The order of parsing conf file or job history file is not important.
{code}
	JobBuilder jb = new JobBuilder(jobID); // you will need to extract the job ID from the file name: <jobtracker>_job_<timestamp>_<sequence>
	jb.process(jcp.parse(jobConfInputStream));
	JobHistoryParser parser = new Hadoop20JHParser(jobHistoryInputStream);
	try {
		HistoryEvent e;
		while ((e = parser.nextEvent()) != null) {
			jobBuilder.process(e);
		}
	} finally {
		parser.close();
	}
	LoggedJob job = jb.build();
{code}

>From the reading side, the output produced by TraceBuilder or Folder can be read through JobTraceReader or ClusterTopologyReader. One can also use Jackson's ObjectMapper to parse the json formatted data into LoggedJob or LoggedTopology objects.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu resolved MAPREDUCE-1918.
------------------------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

I just committed this. Thanks Amar !

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.10.patch, mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, mapreduce-1918-v1.8.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926069#action_12926069 ] 

Hudson commented on MAPREDUCE-1918:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/])
    

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.10.patch, mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, mapreduce-1918-v1.8.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908609#action_12908609 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1918:
----------------------------------------------------

There is a javadoc warning with the patch. Can you fix it?
{code}
  [javadoc] /home/amarsri/workspace/mapreduce/src/tools/org/apache/hadoop/tools/rumen/TaskAttemptInfo.java:45: warning - Tag @link: reference not found: TaskStatus.State
{code}

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, mapreduce-1918-v1.8.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: mapreduce-1918-v1.7.patch

Attaching a patch that adds user and API documentation to Rumen. test-patch passed

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Ranjit Mathew (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899378#action_12899378 ] 

Ranjit Mathew commented on MAPREDUCE-1918:
------------------------------------------

I would suggest keeping the API information in the package-level JavaDoc documentation and the user-guide information in the document being worked upon using this ticket.
A user looking to run Rumen, to feed its output to GridMix3 for example, would look at the Forrest documentation, while a developer looking to integrate directly or indirectly with Rumen will look at the JavaDoc documentation. We should definitely not mirror the information in both the places as it would add to the maintenance burden and will lead to stale documentation.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: rumen.pdf

Attaching a modified document incorporating changes from Dick.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: mapreduce-1918-v1.8.patch

Attaching a new patch incorporating Hong's comments.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, mapreduce-1918-v1.8.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: rumen.pdf
                mapreduce-1918-v1.3.patch

Attaching a patch for review. test-patch passed on my box.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: mapreduce-1918-v1.4.patch

Attaching a patch for the same. test-patch passed on my box.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1918) Add documentation to Rumen

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-1918:
----------------------------------

    Attachment: mapreduce-1918-v1.10.patch

Attaching a new patch that fixes the javadoc warning. {{TaskAttemptInfo.java}} wasn't modified in the earlier patch  but still resulted into a javadoc warning (not caught by _test-patch_). 

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.10.patch, mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, mapreduce-1918-v1.7.patch, mapreduce-1918-v1.8.patch, rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.