You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Kristoffer Sjögren <st...@gmail.com> on 2014/02/19 22:01:45 UTC

Running crunch on remote jobtracker

Hi

Im running the crunch wordcount example using ToolRunner.run (from
intellij) and data is read from hdfs but the actual job is running locally
instead of on the remote cluster.

Do I need to use hadoop jar command using a pre packaged jar? Or is there
any way to  kick off a remote job?

Cheers,
-Kristoffer

Re: Running crunch on remote jobtracker

Posted by Kristoffer Sjögren <st...@gmail.com>.
Hi Chao

Yes, that was exactly what I was missing.

1) Set hadoop configuration property mapred.job.tracker to the remote
address on port 8021.
2) Use the DistributedCache to upload jar dependencies like so
DistributedCache.addFileToClassPath(new
Path("/tmp/crunch-core-0.8.0-cdh4.3.0.jar"), hadoopConf);

Thanks,
-Kristoffer



On Fri, Feb 21, 2014 at 4:23 AM, Chao Shi <st...@live.com> wrote:

> Hi Kristoffer,
>
> As far as I can tell, you have to package classes into jar before
> submitting a job.
>
> "hadoop jar" is the simplest approach to submit jobs. There are other
> approaches though. MR uses mapred.job.tracker to determine whether to run
> job remotely or locally. "hadoop jar" command will set it to the configured
> job tracker address automatically, so the job is submitted to a remote
> cluster.
>
>
> 2014-02-20 5:01 GMT+08:00 Kristoffer Sjögren <st...@gmail.com>:
>
> Hi
>>
>> Im running the crunch wordcount example using ToolRunner.run (from
>> intellij) and data is read from hdfs but the actual job is running locally
>> instead of on the remote cluster.
>>
>> Do I need to use hadoop jar command using a pre packaged jar? Or is there
>> any way to  kick off a remote job?
>>
>> Cheers,
>> -Kristoffer
>>
>
>

Re: Running crunch on remote jobtracker

Posted by Chao Shi <st...@live.com>.
Hi Kristoffer,

As far as I can tell, you have to package classes into jar before
submitting a job.

"hadoop jar" is the simplest approach to submit jobs. There are other
approaches though. MR uses mapred.job.tracker to determine whether to run
job remotely or locally. "hadoop jar" command will set it to the configured
job tracker address automatically, so the job is submitted to a remote
cluster.


2014-02-20 5:01 GMT+08:00 Kristoffer Sjögren <st...@gmail.com>:

> Hi
>
> Im running the crunch wordcount example using ToolRunner.run (from
> intellij) and data is read from hdfs but the actual job is running locally
> instead of on the remote cluster.
>
> Do I need to use hadoop jar command using a pre packaged jar? Or is there
> any way to  kick off a remote job?
>
> Cheers,
> -Kristoffer
>