You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Steven Willis <sw...@compete.com> on 2014/11/03 23:14:20 UTC

Understanding mapreduce.admin.user.env

I want to make sure that the native libraries installed on the nodemanagers get used by all yarn containers. I first found the mapreduce.admin.{map,reduce}.child.java.opts config property and set it to:

    '-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native'

Basically adding on the native paths to the default values for these properties. This seemed to work, but now I see the warning:

    WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.map.child.java.opts can cause programs to no longer function if hadoop native libraries are used. These values should be set as part of the LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env config settings.

Okay, so I can go and set mapreduce.admin.user.env, but before I do that I have a few questions. Where are these properties actually read in and set? Are they read and set prior to the job being submitted by the client code, on the host where "hadoop jar whatever.jar" is run? Or are they set by the Resource Manager. Or the Application master? Or is it read on the host the map or reduce task actually runs on?

Imagine the following scenarios:

 A. The mapreduce.admin.user.env property is not set explicitly by the job's java code prior to submission. It is not set via command-line switches during submit. It is not set in /etc/hadoop/conf/*-site.xml on the client host. It is not set in /etc/hadoop/conf/*-site.xml on the host running the Resource Manager. It is not set in /etc/hadoop/conf/*-site.xml on the host that runs the Application Master. But it is set in /etc/hadoop/conf/mapred-site.xml on the Node Manager host that runs one of the map tasks.
 B. Same as A, but the property is only set in /etc/hadoop/conf/mapred-site.xml on the host that runs the Application Master (not on any of the Node Managers that run the actual tasks).
 C. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml on the Resource Manager host.
 D. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml on the client submission host.
 E. Same as A, but the property is set either via command line switch, or in the client's code (assuming these cases are the same as D).

In which cases will the map task see the default value for mapreduce.admin.map.child.java.opts, and when will it see the explicitly set value? What happens if it's explicitly set in more than one of the locations referenced above? 

And what about mapred.child.env, where and how does that come into play?

What about yarn.app.mapreduce.am.env and yarn.app.mapreduce.am.admin.user.env, will those settings trickle down to the actual tasks or do they only affect the Application Master's environment? Same with yarn.nodemanager.admin-env, will it trickle down from the Node Manager to the container? Would it be better to set one of these rather than the mapreduce equivalent so that I get the native libraries for all yarn apps, not just mapreduce ones?

-Steven Willis