You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Marc Reichman <mr...@pixelforensics.com> on 2015/01/23 22:56:29 UTC

submission w/classpath without tool.sh?

My apologies if this is covered somewhere, I've done a lot of searching and
come up dry.

I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 to
Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom
java apps, using the Hadoop Tool/Configured interface setup, not a big deal.

To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could use
tool.sh to launch the programs, which worked great for local on-cluster
launching. I however needed to launch from remote hosts (maybe even Windows
ones), and I would bundle a large lib dir with everything I needed on the
client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything
I needed (basically copied the output of accumulo classpath). This would
work for remote submissions, or even local ones, but specifically using my
java mains to launch them without any accumulo or hadoop wrapper scripts.

In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't seem
to get a normal java app to have the 2.x MR Application Master pick up the
accumulo items in the classpath, and my jobs fail with ClassNotFound
exceptions. tool.sh works just fine, but again, I need to be able to submit
without that environment.

I have tried (on the cluster):
HADOOP_CLASSPATH in hadoop-env.sh
HADOOP_CLASSPATH from .bashrc
yarn.application.classpath in yarn-site.xml

I don't mind using tool.sh locally, it's quite nice, but I need a strategy
to have the cluster "setup" so I can just launch java, set my appropriate
hadoop configs for remote fs and yarn hosts, get my accumulo connections
and in/out setup for mapreduce and launch jobs which have accumulo
awareness.

Any ideas?

Thanks,
Marc

Re: submission w/classpath without tool.sh?

Posted by Marc Reichman <mr...@pixelforensics.com>.

So, mapreduce.application.classpath was the winner. It's possible that
yarn.application.classpath would have worked as well. My main issue was
that I was neglecting to include a copy of the XML files in classpath, so
my settings weren't being taken, late night epiphany. Passing the value as
-Dmapreduce.application.classpath=... on the command line allowed this to
take effect and I was fine.

For remote clients, I have copied into a local classpath lib what I need to
launch, the jar list output from accumulo classpath, and a set of the XML
files needed to set the appropriate client-side mapreduce options to launch
properly, including the classpath mentioned above but also the various
memory-related settings in YARN/MR2.

Thanks for the help Billie!

On Sat, Jan 24, 2015 at 7:51 AM, Billie Rinaldi <bi...@apache.org> wrote:

> You might have to set yarn.application.classpath in both the client and
> the server conf. At least that's what Slider does.
> On Jan 23, 2015 10:00 PM, "Marc Reichman" <mr...@pixelforensics.com>
> wrote:
>
>> That's correct, I don't really want to have the client have to package up
>> every accumulo and zookeeper jar I need in dcache or a fat jar or whatever
>> just to run stuff from a remote client when the jars are all there.
>>
>> I did try yarn.application.classpath, but I didn't spell out the whole
>> thing. Next try I will take all those jars and put them in explicitly
>> instead of the dir wildcards. I will update how it goes.
>>
>> On Fri, Jan 23, 2015 at 5:19 PM, Billie Rinaldi <bi...@apache.org>
>> wrote:
>>
>>> You have all the jars your app needs on both the servers and the client
>>> (as opposed to wanting Yarn to distribute them)?  Then
>>> yarn.application.classpath should be what you need.  It looks like
>>> /etc/hadoop/conf,/some/lib/dir/*,/some/other/lib/dir/* etc.  Is that what
>>> you're trying?
>>>
>>> On Fri, Jan 23, 2015 at 1:56 PM, Marc Reichman <
>>> mreichman@pixelforensics.com> wrote:
>>>
>>>> My apologies if this is covered somewhere, I've done a lot of searching
>>>> and come up dry.
>>>>
>>>> I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1
>>>> to Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom
>>>> java apps, using the Hadoop Tool/Configured interface setup, not a big deal.
>>>>
>>>> To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could
>>>> use tool.sh to launch the programs, which worked great for local on-cluster
>>>> launching. I however needed to launch from remote hosts (maybe even Windows
>>>> ones), and I would bundle a large lib dir with everything I needed on the
>>>> client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything
>>>> I needed (basically copied the output of accumulo classpath). This would
>>>> work for remote submissions, or even local ones, but specifically using my
>>>> java mains to launch them without any accumulo or hadoop wrapper scripts.
>>>>
>>>> In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't
>>>> seem to get a normal java app to have the 2.x MR Application Master pick up
>>>> the accumulo items in the classpath, and my jobs fail with ClassNotFound
>>>> exceptions. tool.sh works just fine, but again, I need to be able to submit
>>>> without that environment.
>>>>
>>>> I have tried (on the cluster):
>>>> HADOOP_CLASSPATH in hadoop-env.sh
>>>> HADOOP_CLASSPATH from .bashrc
>>>> yarn.application.classpath in yarn-site.xml
>>>>
>>>> I don't mind using tool.sh locally, it's quite nice, but I need a
>>>> strategy to have the cluster "setup" so I can just launch java, set my
>>>> appropriate hadoop configs for remote fs and yarn hosts, get my accumulo
>>>> connections and in/out setup for mapreduce and launch jobs which have
>>>> accumulo awareness.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>> Marc
>>>>
>>>
>>>
>>

Re: submission w/classpath without tool.sh?

Posted by Billie Rinaldi <bi...@apache.org>.

You might have to set yarn.application.classpath in both the client and the
server conf. At least that's what Slider does.
On Jan 23, 2015 10:00 PM, "Marc Reichman" <mr...@pixelforensics.com>
wrote:

> That's correct, I don't really want to have the client have to package up
> every accumulo and zookeeper jar I need in dcache or a fat jar or whatever
> just to run stuff from a remote client when the jars are all there.
>
> I did try yarn.application.classpath, but I didn't spell out the whole
> thing. Next try I will take all those jars and put them in explicitly
> instead of the dir wildcards. I will update how it goes.
>
> On Fri, Jan 23, 2015 at 5:19 PM, Billie Rinaldi <bi...@apache.org> wrote:
>
>> You have all the jars your app needs on both the servers and the client
>> (as opposed to wanting Yarn to distribute them)?  Then
>> yarn.application.classpath should be what you need.  It looks like
>> /etc/hadoop/conf,/some/lib/dir/*,/some/other/lib/dir/* etc.  Is that what
>> you're trying?
>>
>> On Fri, Jan 23, 2015 at 1:56 PM, Marc Reichman <
>> mreichman@pixelforensics.com> wrote:
>>
>>> My apologies if this is covered somewhere, I've done a lot of searching
>>> and come up dry.
>>>
>>> I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 to
>>> Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom
>>> java apps, using the Hadoop Tool/Configured interface setup, not a big deal.
>>>
>>> To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could use
>>> tool.sh to launch the programs, which worked great for local on-cluster
>>> launching. I however needed to launch from remote hosts (maybe even Windows
>>> ones), and I would bundle a large lib dir with everything I needed on the
>>> client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything
>>> I needed (basically copied the output of accumulo classpath). This would
>>> work for remote submissions, or even local ones, but specifically using my
>>> java mains to launch them without any accumulo or hadoop wrapper scripts.
>>>
>>> In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't
>>> seem to get a normal java app to have the 2.x MR Application Master pick up
>>> the accumulo items in the classpath, and my jobs fail with ClassNotFound
>>> exceptions. tool.sh works just fine, but again, I need to be able to submit
>>> without that environment.
>>>
>>> I have tried (on the cluster):
>>> HADOOP_CLASSPATH in hadoop-env.sh
>>> HADOOP_CLASSPATH from .bashrc
>>> yarn.application.classpath in yarn-site.xml
>>>
>>> I don't mind using tool.sh locally, it's quite nice, but I need a
>>> strategy to have the cluster "setup" so I can just launch java, set my
>>> appropriate hadoop configs for remote fs and yarn hosts, get my accumulo
>>> connections and in/out setup for mapreduce and launch jobs which have
>>> accumulo awareness.
>>>
>>> Any ideas?
>>>
>>> Thanks,
>>> Marc
>>>
>>
>>
>

Re: submission w/classpath without tool.sh?

Posted by Marc Reichman <mr...@pixelforensics.com>.

That's correct, I don't really want to have the client have to package up
every accumulo and zookeeper jar I need in dcache or a fat jar or whatever
just to run stuff from a remote client when the jars are all there.

I did try yarn.application.classpath, but I didn't spell out the whole
thing. Next try I will take all those jars and put them in explicitly
instead of the dir wildcards. I will update how it goes.

On Fri, Jan 23, 2015 at 5:19 PM, Billie Rinaldi <bi...@apache.org> wrote:

> You have all the jars your app needs on both the servers and the client
> (as opposed to wanting Yarn to distribute them)?  Then
> yarn.application.classpath should be what you need.  It looks like
> /etc/hadoop/conf,/some/lib/dir/*,/some/other/lib/dir/* etc.  Is that what
> you're trying?
>
> On Fri, Jan 23, 2015 at 1:56 PM, Marc Reichman <
> mreichman@pixelforensics.com> wrote:
>
>> My apologies if this is covered somewhere, I've done a lot of searching
>> and come up dry.
>>
>> I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 to
>> Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom
>> java apps, using the Hadoop Tool/Configured interface setup, not a big deal.
>>
>> To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could use
>> tool.sh to launch the programs, which worked great for local on-cluster
>> launching. I however needed to launch from remote hosts (maybe even Windows
>> ones), and I would bundle a large lib dir with everything I needed on the
>> client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything
>> I needed (basically copied the output of accumulo classpath). This would
>> work for remote submissions, or even local ones, but specifically using my
>> java mains to launch them without any accumulo or hadoop wrapper scripts.
>>
>> In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't
>> seem to get a normal java app to have the 2.x MR Application Master pick up
>> the accumulo items in the classpath, and my jobs fail with ClassNotFound
>> exceptions. tool.sh works just fine, but again, I need to be able to submit
>> without that environment.
>>
>> I have tried (on the cluster):
>> HADOOP_CLASSPATH in hadoop-env.sh
>> HADOOP_CLASSPATH from .bashrc
>> yarn.application.classpath in yarn-site.xml
>>
>> I don't mind using tool.sh locally, it's quite nice, but I need a
>> strategy to have the cluster "setup" so I can just launch java, set my
>> appropriate hadoop configs for remote fs and yarn hosts, get my accumulo
>> connections and in/out setup for mapreduce and launch jobs which have
>> accumulo awareness.
>>
>> Any ideas?
>>
>> Thanks,
>> Marc
>>
>
>

Re: submission w/classpath without tool.sh?

Posted by Billie Rinaldi <bi...@apache.org>.

You have all the jars your app needs on both the servers and the client (as
opposed to wanting Yarn to distribute them)?  Then
yarn.application.classpath should be what you need.  It looks like
/etc/hadoop/conf,/some/lib/dir/*,/some/other/lib/dir/* etc.  Is that what
you're trying?

On Fri, Jan 23, 2015 at 1:56 PM, Marc Reichman <mreichman@pixelforensics.com
> wrote:

> My apologies if this is covered somewhere, I've done a lot of searching
> and come up dry.
>
> I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 to
> Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom
> java apps, using the Hadoop Tool/Configured interface setup, not a big deal.
>
> To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could use
> tool.sh to launch the programs, which worked great for local on-cluster
> launching. I however needed to launch from remote hosts (maybe even Windows
> ones), and I would bundle a large lib dir with everything I needed on the
> client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything
> I needed (basically copied the output of accumulo classpath). This would
> work for remote submissions, or even local ones, but specifically using my
> java mains to launch them without any accumulo or hadoop wrapper scripts.
>
> In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't
> seem to get a normal java app to have the 2.x MR Application Master pick up
> the accumulo items in the classpath, and my jobs fail with ClassNotFound
> exceptions. tool.sh works just fine, but again, I need to be able to submit
> without that environment.
>
> I have tried (on the cluster):
> HADOOP_CLASSPATH in hadoop-env.sh
> HADOOP_CLASSPATH from .bashrc
> yarn.application.classpath in yarn-site.xml
>
> I don't mind using tool.sh locally, it's quite nice, but I need a strategy
> to have the cluster "setup" so I can just launch java, set my appropriate
> hadoop configs for remote fs and yarn hosts, get my accumulo connections
> and in/out setup for mapreduce and launch jobs which have accumulo
> awareness.
>
> Any ideas?
>
> Thanks,
> Marc
>