You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2015/07/23 02:48:05 UTC

[jira] [Comment Edited] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637998#comment-14637998 ] 

Allen Wittenauer edited comment on MAPREDUCE-6415 at 7/23/15 12:47 AM:
-----------------------------------------------------------------------

bq.  The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell.

a) This is only true for Windows.  Unix has been using HADOOP_PREFIX since 0.21.  If it's being defined, it's not by the bash code that starts the NM that ships with Apache Hadoop.

b) I'm unsure if LCE actually inherits all of the shell environment or only specific variables.

bq. The 'hadoop archive' command starts up a JVM. I don't see how we can get around that unless we call it programmatically from an existing JVM and also do it serially, which is going to take a lot longer overall.

There are several hadoop command in the generated shell code.  That's many many JVM startup costs.  Granted there has been a lot of work in trunk to minimize those costs (classpath dedupe, etc), but it's still very expensive.


was (Author: aw):
bq.  The shell inherits the env of the NodeManager as a base. HADOOP_HOME should be defined for the NM, so it ends up in env of the shell.

a) This is only true for Windows.  Unix has been using HADOOP_PREFIX since 0.21.  If it's being defined, it's not by the bash code that starts the NM that ships with Apache Hadoop.

b) I'm unsure if LCE actually inherits all of the shell environment or only specific variables.

> Create a tool to combine aggregated logs into HAR files
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6415
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch
>
>
> While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem.  We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems.  See the design document for details.
> See YARN-2942 for more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)