You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Armstrong <jr...@ccri.com> on 2012/02/16 15:49:12 UTC

Changing default task JVM classpath

Hi, everybody.

I'm having some difficulties, which I've traced to not having the
Accumulo libraries and configuration available in my task JVMs.  The
most elegant solution -- especially since I will not always have control
over the Accumulo configuration files -- would be to make them available
to the JVMs directly.

First, to make sure I've understood correctly: the JVM classpath
consists by default of

* the hadoop/conf and (the contents of) the hadoop/lib directories
* the job JAR file
* any other files placed on the distributed classpath

My question is, essentially, how to change the first of these.  What
setting controls this, and can I change it to point at additional
directories?

As a side note, how does a cluster with an HBase install make sure that
all the tasks that might need them have access to the HBase libraries
and configuration?  Is the job responsible of transporting them?

Thanks in advance.


Re: Changing default task JVM classpath

Posted by John Armstrong <jr...@ccri.com>.
On 02/16/2012 10:15 AM, Harsh J wrote:
> That is how HBase does it: HBaseConfiguration at driver loads up HBase
> *xml file configs from driver classpath (or user set() entries, either
> way), and then submits that as part of job.xml. These configs should
> be all you need.

It should be, and yet I'm running into sporadic problems.  The details 
are sort of separate from mapreduce proper, and I'm still not sure of 
the exact root cause (sporadic bugs are the worst), but it seems to come 
down to an odd confluence of behaviors from Oozie, Zookeeper, and 
Accumulo (another implementation of BigTable).

The gist is that occasionally -- randomly -- the Oozie-launched Java 
program needs to go looking for the Accumulo site configuration, which 
requires looking for an XML file resource on the classpath.  Not finding 
it, it goes with the defaults, meaning Accumulo no longer knows where my 
cluster's Zookeepers are; it tries to reconnect to localhost (the 
default) and fails in an endless loop.

So yes, I've set the relevant properties in my own configuration which I 
give to Oozie, but when "something" happens (my WAG: zookeeper lock 
lost?) Accumulo insists on looking in its SiteConfiguration, which means 
loading the XML resource.

For the moment I've placed a softlink in $HADOOP_HOME/conf/ to the 
needed Accumulo configuration file, but I'm wondering if I can just tell 
the task JVMs to have access to the Accumulo configuration directories 
as well.

Re: Changing default task JVM classpath

Posted by Harsh J <ha...@cloudera.com>.
You should load the config elements into the job configuration XML
(Job.getConfiguration() or JobConf) during submission - loading from
each machine will introduce problems you don't need and can rather
avoid.

That is how HBase does it: HBaseConfiguration at driver loads up HBase
*xml file configs from driver classpath (or user set() entries, either
way), and then submits that as part of job.xml. These configs should
be all you need.

On Thu, Feb 16, 2012 at 8:19 PM, John Armstrong <jr...@ccri.com> wrote:
> Hi, everybody.
>
> I'm having some difficulties, which I've traced to not having the
> Accumulo libraries and configuration available in my task JVMs.  The
> most elegant solution -- especially since I will not always have control
> over the Accumulo configuration files -- would be to make them available
> to the JVMs directly.
>
> First, to make sure I've understood correctly: the JVM classpath
> consists by default of
>
> * the hadoop/conf and (the contents of) the hadoop/lib directories
> * the job JAR file
> * any other files placed on the distributed classpath
>
> My question is, essentially, how to change the first of these.  What
> setting controls this, and can I change it to point at additional
> directories?
>
> As a side note, how does a cluster with an HBase install make sure that
> all the tasks that might need them have access to the HBase libraries
> and configuration?  Is the job responsible of transporting them?
>
> Thanks in advance.
>



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about