You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Guttadauro, Jeff" <je...@here.com> on 2016/05/04 20:45:55 UTC

yarn.application.classpath confusion...

Hello.

My team is working on moving some Hadoop 1 jobs (using an old AWS EMR AMI) to YARN / Hadoop 2 (using the newer AWS EMR Release 4.x).  We have an edge node with Hadoop 2.7.2 installed from which jobs get submitted to the cluster.  It appears that we must have the yarn.application.classpath property set in the yarn-site.xml file on the client (edge node) in order for our jobs to get submitted successfully.  Otherwise, the jobs fail citing the following error: "java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/v2/app/MRAppMaster".  This has caused a lot of confusion.

Our understanding of the precedence used for setting properties is that it will use the setting found first from the following order of places to look: (1) Job/JobConf for the MR job, often set programmatically, (2) *-site.xml files on client machine, (3) *-site.xml files on cluster nodes, and finally (4) the *-default.xml files from the Hadoop installation.  So, we are confused as to why it won't just find no setting on the client and fallback to the setting from yarn-site.xml on the cluster nodes...?  That's how I would expect this particular property to be most commonly used anyway, as it seems wrong and backwards that the client would be telling YARN what its classpath should be on the cluster!  In fact, this is one of those settings that I would expect to see commonly set to "final" on the cluster, as I think you would want to prevent a client from providing its own value, since it doesn't make sense that a client should know where things are installed on the cluster nodes anyway.

Perhaps I have a fundamental misunderstanding of something as we're migrating to the new YARN framework.  A lot of what I find online seems to talk about submitting jobs from the cluster (typically from the master node) itself, in which case it makes sense that this value should be set.  But, when dealing with an edge node set-up like ours, I would think it should be fine to leave that property unset.  Can you help me understand what's going on?

Thanks!
-Jeff