You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Amit Sela <am...@infolinks.com> on 2013/01/24 13:13:50 UTC
Submitting MapReduce job from remote server using JobClient
Hi all,
I want to run a MapReduce job using the Hadoop Java api from my analytics
server. It is not the master or even a data node but it has the same Hadoop
installation as all the nodes in the cluster.
I tried using JobClient.runJob() but it accepts JobConf as argument and
when using JobConf it is possible to set only mapred Mapper classes and I
use mapreduce...
I tried using JobControl and ControlledJob but it seems like it tries to
run the job locally. the map phase just keeps attempting...
Anyone tried it before ?
I'm just looking for a way to submit MapReduce jobs from Java code and be
able to monitor them.
Thanks,
Amit.
Re: Submitting MapReduce job from remote server using JobClient
Posted by Panshul Whisper <ou...@gmail.com>.
Hello Amit,
I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:
http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/
hope this helps,
Regards,
On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>> *org.apache.hadoop.mapred.JobClient - Running
>> job: job_local_0001*
>> *INFO org.apache.hadoop.util.ProcessTree -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:15.510 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO org.apache.hadoop.mapred.MapTask -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
--
Regards,
Ouch Whisper
010101010101
Re: Submitting MapReduce job from remote server using JobClient
Posted by Panshul Whisper <ou...@gmail.com>.
Hello Amit,
I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:
http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/
hope this helps,
Regards,
On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>> *org.apache.hadoop.mapred.JobClient - Running
>> job: job_local_0001*
>> *INFO org.apache.hadoop.util.ProcessTree -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:15.510 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO org.apache.hadoop.mapred.MapTask -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
--
Regards,
Ouch Whisper
010101010101
Re: Submitting MapReduce job from remote server using JobClient
Posted by Panshul Whisper <ou...@gmail.com>.
Hello Amit,
I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:
http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/
hope this helps,
Regards,
On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>> *org.apache.hadoop.mapred.JobClient - Running
>> job: job_local_0001*
>> *INFO org.apache.hadoop.util.ProcessTree -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:15.510 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO org.apache.hadoop.mapred.MapTask -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
--
Regards,
Ouch Whisper
010101010101
Re: Submitting MapReduce job from remote server using JobClient
Posted by Panshul Whisper <ou...@gmail.com>.
Hello Amit,
I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:
http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/
hope this helps,
Regards,
On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>> *org.apache.hadoop.mapred.JobClient - Running
>> job: job_local_0001*
>> *INFO org.apache.hadoop.util.ProcessTree -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:15.510 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO org.apache.hadoop.mapred.JobClient - map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO org.apache.hadoop.mapred.MapTask -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51] INFO
>> org.apache.hadoop.mapred.MapTask - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.LocalJobRunner - *
>> *2013-01-24 08:10:24.529 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51] INFO
>> org.apache.hadoop.mapred.Task - Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>
--
Regards,
Ouch Whisper
010101010101
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?
On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *INFO org.apache.hadoop.util.ProcessTree -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:15.510 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO org.apache.hadoop.mapred.MapTask -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?
On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *INFO org.apache.hadoop.util.ProcessTree -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:15.510 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO org.apache.hadoop.mapred.MapTask -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?
On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *INFO org.apache.hadoop.util.ProcessTree -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:15.510 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO org.apache.hadoop.mapred.MapTask -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?
On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
> *org.apache.hadoop.mapred.JobClient - Running
> job: job_local_0001*
> *INFO org.apache.hadoop.util.ProcessTree -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:15.510 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO org.apache.hadoop.mapred.JobClient - map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO org.apache.hadoop.mapred.MapTask -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51] INFO
> org.apache.hadoop.mapred.MapTask - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51] INFO
> org.apache.hadoop.mapred.Task -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.LocalJobRunner - *
> *2013-01-24 08:10:24.529 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51] INFO
> org.apache.hadoop.mapred.Task - Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by be...@gmail.com.
Hi Amit,
Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?
If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by be...@gmail.com.
Hi Amit,
Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?
If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by be...@gmail.com.
Hi Amit,
Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?
If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by be...@gmail.com.
Hi Amit,
Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?
If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster.
Regards
Bejoy KS
Sent from remote device, Please excuse typos
-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Amit Sela <am...@infolinks.com>.
Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
*org.apache.hadoop.mapred.JobClient - Running
job: job_local_0001*
*INFO org.apache.hadoop.util.ProcessTree -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:15.510 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO org.apache.hadoop.mapred.JobClient - map
100% reduce 0%*
And after that, instead of going to Reduce phase I keep getting map
attempts like:
*INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51] INFO
org.apache.hadoop.mapred.MapTask - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51] INFO
org.apache.hadoop.mapred.Task -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.LocalJobRunner - *
*2013-01-24 08:10:24.529 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51] INFO
org.apache.hadoop.mapred.Task - Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.
On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>
Re: Submitting MapReduce job from remote server using JobClient
Posted by Harsh J <ha...@cloudera.com>.
The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.
On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.
--
Harsh J
Re: Submitting MapReduce job from remote server using JobClient
Posted by Harsh J <ha...@cloudera.com>.
The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.
On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.
--
Harsh J
Re: Submitting MapReduce job from remote server using JobClient
Posted by Harsh J <ha...@cloudera.com>.
The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.
On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.
--
Harsh J
Re: Submitting MapReduce job from remote server using JobClient
Posted by Harsh J <ha...@cloudera.com>.
The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.
On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.
--
Harsh J