You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2015/07/21 23:50:04 UTC
[jira] [Updated] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated MAPREDUCE-6415:
-------------------------------------
    Attachment: MAPREDUCE-6415_branch-2_prelim_001.patch
                MAPREDUCE-6415_prelim_001.patch

I've uploaded a preliminary patch.  It adds a command that will look for eligible apps to process, generate a script that will run the 'hadoop archive' command, and runs the script in the distributed shell.  It also modifies the 'yarn logs' command and JHS to be able to read the har files.  All as described in the design document.

I still have to write some unit tests and split up the patch into MAPREDUCE and YARN (and HADOOP?) JIRAs.

We can also discuss if we have the right criteria for eligibility.  I implemented the ones mentioned in the design document, but it shouldn't be too hard to change them.

Here's the CLI usage:
{noformat}
>> bin/mapred archive-logs -help
usage: yarn archive-logs
 -help                       Prints this message
 -maxEligibleApps <n>        The maximum number of eligible apps to
                             process (default: -1 (all))
 -maxTotalLogsSize <bytes>   The maximum total logs size required to be
                             eligible (default: 1GB)
 -memory <megabytes>         The amount of memory for each container
                             (default: 1024)
 -minNumberLogFiles <n>      The minimum number of log files required to
                             be eligible (default: 20)
{noformat}

I know it's a bit hard to tell from the Java code what the shell script looks like, so here's an example of one:
{code}
#!/bin/bash
set -e
set -x
CONTAINER_ID_NUM=`echo $CONTAINER_ID | cut -d "_" -f 5`
if [ "$CONTAINER_ID_NUM" == "000002" ]; then
        appId="application_1437514991365_0004"
        user="rkanter"
elif [ "$CONTAINER_ID_NUM" == "000003" ]; then
        appId="application_1437514991365_0005"
        user="rkanter"
elif [ "$CONTAINER_ID_NUM" == "000004" ]; then
        appId="application_1437514991365_0003"
        user="rkanter"
elif [ "$CONTAINER_ID_NUM" == "000005" ]; then
        appId="application_1437514991365_0007"
        user="rkanter"
elif [ "$CONTAINER_ID_NUM" == "000006" ]; then
        appId="application_1437514991365_0006"
        user="rkanter"
else
        echo "Unknown Mapping!"
        exit -1
fi
export HADOOP_CLIENT_OPTS="-Xmx1024m"
$HADOOP_HOME/bin/hadoop archive -Dmapreduce.framework.name=local -archiveName $appId.har -p /tmp/logs/$user/logs/$appId \* /tmp/logs/archive-logs-work
$HADOOP_HOME/bin/hadoop fs -mv /tmp/logs/archive-logs-work/$appId.har /tmp/logs/$user/logs/$appId/$appId.har
originalLogs=`$HADOOP_HOME/bin/hadoop fs -ls /tmp/logs/$user/logs/$appId | grep "^-" | awk '{print $8}'`
if [ ! -z "$originalLogs" ]; then
        $HADOOP_HOME/bin/hadoop fs -rm $originalLogs
fi
{code}

> Create a tool to combine aggregated logs into HAR files
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6415
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.8.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: HAR-ableAggregatedLogs_v1.pdf, MAPREDUCE-6415_branch-2_prelim_001.patch, MAPREDUCE-6415_prelim_001.patch
>
>
> While we wait for YARN-2942 to become viable, it would still be great to improve the aggregated logs problem.  We can write a tool that combines aggregated log files into a single HAR file per application, which should solve the too many files and too many blocks problems.  See the design document for details.
> See YARN-2942 for more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)