You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Kristoffer Sjögren <st...@gmail.com> on 2014/02/19 22:01:45 UTC
Running crunch on remote jobtracker
Hi
Im running the crunch wordcount example using ToolRunner.run (from
intellij) and data is read from hdfs but the actual job is running locally
instead of on the remote cluster.
Do I need to use hadoop jar command using a pre packaged jar? Or is there
any way to kick off a remote job?
Cheers,
-Kristoffer
Re: Running crunch on remote jobtracker
Posted by Kristoffer Sjögren <st...@gmail.com>.
Hi Chao
Yes, that was exactly what I was missing.
1) Set hadoop configuration property mapred.job.tracker to the remote
address on port 8021.
2) Use the DistributedCache to upload jar dependencies like so
DistributedCache.addFileToClassPath(new
Path("/tmp/crunch-core-0.8.0-cdh4.3.0.jar"), hadoopConf);
Thanks,
-Kristoffer
On Fri, Feb 21, 2014 at 4:23 AM, Chao Shi <st...@live.com> wrote:
> Hi Kristoffer,
>
> As far as I can tell, you have to package classes into jar before
> submitting a job.
>
> "hadoop jar" is the simplest approach to submit jobs. There are other
> approaches though. MR uses mapred.job.tracker to determine whether to run
> job remotely or locally. "hadoop jar" command will set it to the configured
> job tracker address automatically, so the job is submitted to a remote
> cluster.
>
>
> 2014-02-20 5:01 GMT+08:00 Kristoffer Sjögren <st...@gmail.com>:
>
> Hi
>>
>> Im running the crunch wordcount example using ToolRunner.run (from
>> intellij) and data is read from hdfs but the actual job is running locally
>> instead of on the remote cluster.
>>
>> Do I need to use hadoop jar command using a pre packaged jar? Or is there
>> any way to kick off a remote job?
>>
>> Cheers,
>> -Kristoffer
>>
>
>
Re: Running crunch on remote jobtracker
Posted by Chao Shi <st...@live.com>.
Hi Kristoffer,
As far as I can tell, you have to package classes into jar before
submitting a job.
"hadoop jar" is the simplest approach to submit jobs. There are other
approaches though. MR uses mapred.job.tracker to determine whether to run
job remotely or locally. "hadoop jar" command will set it to the configured
job tracker address automatically, so the job is submitted to a remote
cluster.
2014-02-20 5:01 GMT+08:00 Kristoffer Sjögren <st...@gmail.com>:
> Hi
>
> Im running the crunch wordcount example using ToolRunner.run (from
> intellij) and data is read from hdfs but the actual job is running locally
> instead of on the remote cluster.
>
> Do I need to use hadoop jar command using a pre packaged jar? Or is there
> any way to kick off a remote job?
>
> Cheers,
> -Kristoffer
>