You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Amit Sela <am...@infolinks.com> on 2013/01/24 13:13:50 UTC

Submitting MapReduce job from remote server using JobClient

Hi all,

I want to run a MapReduce job using the Hadoop Java api from my analytics
server. It is not the master or even a data node but it has the same Hadoop
installation as all the nodes in the cluster.
I tried using JobClient.runJob() but it accepts JobConf as argument and
when using JobConf it is possible to set only mapred Mapper classes and I
use mapreduce...
I tried using JobControl and ControlledJob but it seems like it tries to
run the job locally. the map phase just keeps attempting...
Anyone tried it before ?
I'm just looking for a way to submit MapReduce jobs from Java code and be
able to monitor them.

Thanks,

Amit.

Re: Submitting MapReduce job from remote server using JobClient

Posted by Panshul Whisper <ou...@gmail.com>.

Hello Amit,

I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:

http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/

hope this helps,
Regards,



On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:

> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>>  *org.apache.hadoop.mapred.JobClient                           - Running
>> job: job_local_0001*
>> *INFO  org.apache.hadoop.util.ProcessTree                           -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO  org.apache.hadoop.mapred.MapTask                             -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: Submitting MapReduce job from remote server using JobClient

Posted by Panshul Whisper <ou...@gmail.com>.

Hello Amit,

I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:

http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/

hope this helps,
Regards,



On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:

> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>>  *org.apache.hadoop.mapred.JobClient                           - Running
>> job: job_local_0001*
>> *INFO  org.apache.hadoop.util.ProcessTree                           -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO  org.apache.hadoop.mapred.MapTask                             -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: Submitting MapReduce job from remote server using JobClient

Posted by Panshul Whisper <ou...@gmail.com>.

Hello Amit,

I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:

http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/

hope this helps,
Regards,



On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:

> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>>  *org.apache.hadoop.mapred.JobClient                           - Running
>> job: job_local_0001*
>> *INFO  org.apache.hadoop.util.ProcessTree                           -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO  org.apache.hadoop.mapred.MapTask                             -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: Submitting MapReduce job from remote server using JobClient

Posted by Panshul Whisper <ou...@gmail.com>.

Hello Amit,

I tried the same scenario, submitting map reduce jobs from a system that is
outside the hadoop cluster and I used Sring Hadoop to do it. It worked
wonderfully. Spring has made alot of things easier...
you can try it. Here is a reference on how to do it:

http://www.petrikainulainen.net/programming/apache-hadoop/creating-hadoop-mapreduce-job-with-spring-data-apache-hadoop/

hope this helps,
Regards,



On Sun, Jan 27, 2013 at 12:43 PM, Amit Sela <am...@infolinks.com> wrote:

> Yes I do.
> I checked that by printing out Configuration.toString() and I see only the
> files I add as resources.
> Moreover, in my test environment, the test Analytics server is also a data
> node (or maybe that could cause more trouble ?).
> Anyway, I still get
> *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *
> *
> And I don't know what's wrong here, I create a new Configuration(false) to
> avoid default settings. I set the resources manually (addResource). I
> validate it. Anything I'm forgetting ?
>
>
> On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:
>
>> **
>> Hi Amit,
>>
>> Apart for the hadoop jars, Do you have the same config files
>> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
>> well?
>>
>> If you are having the default config files in analytics server then your
>> MR job would be running locally and not on the cluster.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Amit Sela <am...@infolinks.com>
>> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
>> *To: *<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Re: Submitting MapReduce job from remote server using
>> JobClient
>>
>> Hi Harsh,
>> I'm using Job.waitForCompletion() method to run the job but I can't see
>> it in the webapp and it doesn't seem to finish...
>> I get:
>>  *org.apache.hadoop.mapred.JobClient                           - Running
>> job: job_local_0001*
>> *INFO  org.apache.hadoop.util.ProcessTree                           -
>> setsid exited with exit code 0*
>> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
>> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:13.348
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 0% reduce 0%*
>> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000000_0' done.*
>> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
>> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
>> = 100*
>> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:16.358
>> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
>> INFO  org.apache.hadoop.mapred.JobClient                           -  map
>> 100% reduce 0%*
>>
>> And after that, instead of going to Reduce phase I keep getting map
>> attempts like:
>>
>> *INFO  org.apache.hadoop.mapred.MapTask                             -
>> io.sort.mb = 100*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - data buffer
>> = 79691776/99614720*
>> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - record
>> buffer = 262144/327680*
>> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.MapTask                             - Starting
>> flush of map output*
>> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -
>> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
>> commiting*
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.LocalJobRunner                      - *
>> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                - Task
>> 'attempt_local_0001_m_000003_0' done.*
>> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>>  org.apache.hadoop.mapred.Task                                -  Using
>> ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
>> *
>> *
>> Any clues ?
>> Thanks for the help.
>>
>> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> The Job class itself has a blocking and non-blocking submitter that is
>>> similar to JobConf's runJob method you discovered. See
>>>
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>>> and its following method waitForCompletion(). These seem to be what
>>> you're looking for.
>>>
>>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>>> > Hi all,
>>> >
>>> > I want to run a MapReduce job using the Hadoop Java api from my
>>> analytics
>>> > server. It is not the master or even a data node but it has the same
>>> Hadoop
>>> > installation as all the nodes in the cluster.
>>> > I tried using JobClient.runJob() but it accepts JobConf as argument
>>> and when
>>> > using JobConf it is possible to set only mapred Mapper classes and I
>>> use
>>> > mapreduce...
>>> > I tried using JobControl and ControlledJob but it seems like it tries
>>> to run
>>> > the job locally. the map phase just keeps attempting...
>>> > Anyone tried it before ?
>>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>>> be
>>> > able to monitor them.
>>> >
>>> > Thanks,
>>> >
>>> > Amit.
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Regards,
Ouch Whisper
010101010101

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?


On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:

> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
>  *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *INFO  org.apache.hadoop.util.ProcessTree                           -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO  org.apache.hadoop.mapred.MapTask                             -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?


On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:

> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
>  *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *INFO  org.apache.hadoop.util.ProcessTree                           -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO  org.apache.hadoop.mapred.MapTask                             -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?


On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:

> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
>  *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *INFO  org.apache.hadoop.util.ProcessTree                           -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO  org.apache.hadoop.mapred.MapTask                             -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Yes I do.
I checked that by printing out Configuration.toString() and I see only the
files I add as resources.
Moreover, in my test environment, the test Analytics server is also a data
node (or maybe that could cause more trouble ?).
Anyway, I still get
*org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*
*
And I don't know what's wrong here, I create a new Configuration(false) to
avoid default settings. I set the resources manually (addResource). I
validate it. Anything I'm forgetting ?


On Thu, Jan 24, 2013 at 9:49 PM, <be...@gmail.com> wrote:

> **
> Hi Amit,
>
> Apart for the hadoop jars, Do you have the same config files
> ($HADOOP_HOME/conf) that are in the cluster on your analytics server as
> well?
>
> If you are having the default config files in analytics server then your
> MR job would be running locally and not on the cluster.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Amit Sela <am...@infolinks.com>
> *Date: *Thu, 24 Jan 2013 18:15:49 +0200
> *To: *<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Re: Submitting MapReduce job from remote server using JobClient
>
> Hi Harsh,
> I'm using Job.waitForCompletion() method to run the job but I can't see it
> in the webapp and it doesn't seem to finish...
> I get:
>  *org.apache.hadoop.mapred.JobClient                           - Running
> job: job_local_0001*
> *INFO  org.apache.hadoop.util.ProcessTree                           -
> setsid exited with exit code 0*
> *2013-01-24 08:10:12.521 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
> *2013-01-24 08:10:12.536 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:12.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:12.599 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:12.608 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:13.348
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 0% reduce 0%*
> *2013-01-24 08:10:15.509 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:15.510 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000000_0' done.*
> *2013-01-24 08:10:15.511 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
> *2013-01-24 08:10:15.512 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - io.sort.mb
> = 100*
> *2013-01-24 08:10:15.549 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:15.550 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:15.557 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:15.560 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:16.358
> [org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
> INFO  org.apache.hadoop.mapred.JobClient                           -  map
> 100% reduce 0%*
>
> And after that, instead of going to Reduce phase I keep getting map
> attempts like:
>
> *INFO  org.apache.hadoop.mapred.MapTask                             -
> io.sort.mb = 100*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - data buffer
> = 79691776/99614720*
> *2013-01-24 08:10:21.563 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - record
> buffer = 262144/327680*
> *2013-01-24 08:10:21.570 [Thread-51]                INFO
>  org.apache.hadoop.mapred.MapTask                             - Starting
> flush of map output*
> *2013-01-24 08:10:21.573 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -
> Task:attempt_local_0001_m_000003_0 is done. And is in the process of
> commiting*
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.LocalJobRunner                      - *
> *2013-01-24 08:10:24.529 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                - Task
> 'attempt_local_0001_m_000003_0' done.*
> *2013-01-24 08:10:24.530 [Thread-51]                INFO
>  org.apache.hadoop.mapred.Task                                -  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
> *
> *
> Any clues ?
> Thanks for the help.
>
> On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> The Job class itself has a blocking and non-blocking submitter that is
>> similar to JobConf's runJob method you discovered. See
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
>> and its following method waitForCompletion(). These seem to be what
>> you're looking for.
>>
>> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
>> > Hi all,
>> >
>> > I want to run a MapReduce job using the Hadoop Java api from my
>> analytics
>> > server. It is not the master or even a data node but it has the same
>> Hadoop
>> > installation as all the nodes in the cluster.
>> > I tried using JobClient.runJob() but it accepts JobConf as argument and
>> when
>> > using JobConf it is possible to set only mapred Mapper classes and I use
>> > mapreduce...
>> > I tried using JobControl and ControlledJob but it seems like it tries
>> to run
>> > the job locally. the map phase just keeps attempting...
>> > Anyone tried it before ?
>> > I'm just looking for a way to submit MapReduce jobs from Java code and
>> be
>> > able to monitor them.
>> >
>> > Thanks,
>> >
>> > Amit.
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by be...@gmail.com.

Hi Amit,

Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?

If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by be...@gmail.com.

Hi Amit,

Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?

If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by be...@gmail.com.

Hi Amit,

Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?

If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by be...@gmail.com.

Hi Amit,

Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well?

If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Amit Sela <am...@infolinks.com>
Date: Thu, 24 Jan 2013 18:15:49 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: Submitting MapReduce job from remote server using JobClient

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Amit Sela <am...@infolinks.com>.

Hi Harsh,
I'm using Job.waitForCompletion() method to run the job but I can't see it
in the webapp and it doesn't seem to finish...
I get:
 *org.apache.hadoop.mapred.JobClient                           - Running
job: job_local_0001*
*INFO  org.apache.hadoop.util.ProcessTree                           -
setsid exited with exit code 0*
*2013-01-24 08:10:12.521 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7db1be6*
*2013-01-24 08:10:12.536 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:12.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:12.599 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:12.608 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:13.348
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
0% reduce 0%*
*2013-01-24 08:10:15.509 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:15.510 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000000_0' done.*
*2013-01-24 08:10:15.511 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6b02b23d*
*2013-01-24 08:10:15.512 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - io.sort.mb
= 100*
*2013-01-24 08:10:15.549 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:15.550 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:15.557 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:15.560 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:16.358
[org.springframework.scheduling.quartz.SchedulerFactoryBean#0_Worker-1]
INFO  org.apache.hadoop.mapred.JobClient                           -  map
100% reduce 0%*

And after that, instead of going to Reduce phase I keep getting map
attempts like:

*INFO  org.apache.hadoop.mapred.MapTask                             -
io.sort.mb = 100*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - data buffer
= 79691776/99614720*
*2013-01-24 08:10:21.563 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - record
buffer = 262144/327680*
*2013-01-24 08:10:21.570 [Thread-51]                INFO
 org.apache.hadoop.mapred.MapTask                             - Starting
flush of map output*
*2013-01-24 08:10:21.573 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -
Task:attempt_local_0001_m_000003_0 is done. And is in the process of
commiting*
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.LocalJobRunner                      - *
*2013-01-24 08:10:24.529 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                - Task
'attempt_local_0001_m_000003_0' done.*
*2013-01-24 08:10:24.530 [Thread-51]                INFO
 org.apache.hadoop.mapred.Task                                -  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42e87d99*
*
*
Any clues ?
Thanks for the help.

On Thu, Jan 24, 2013 at 5:12 PM, Harsh J <ha...@cloudera.com> wrote:

> The Job class itself has a blocking and non-blocking submitter that is
> similar to JobConf's runJob method you discovered. See
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
> and its following method waitForCompletion(). These seem to be what
> you're looking for.
>
> On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> > Hi all,
> >
> > I want to run a MapReduce job using the Hadoop Java api from my analytics
> > server. It is not the master or even a data node but it has the same
> Hadoop
> > installation as all the nodes in the cluster.
> > I tried using JobClient.runJob() but it accepts JobConf as argument and
> when
> > using JobConf it is possible to set only mapred Mapper classes and I use
> > mapreduce...
> > I tried using JobControl and ControlledJob but it seems like it tries to
> run
> > the job locally. the map phase just keeps attempting...
> > Anyone tried it before ?
> > I'm just looking for a way to submit MapReduce jobs from Java code and be
> > able to monitor them.
> >
> > Thanks,
> >
> > Amit.
>
>
>
> --
> Harsh J
>

Re: Submitting MapReduce job from remote server using JobClient

Posted by Harsh J <ha...@cloudera.com>.

The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.

On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.



-- 
Harsh J

Re: Submitting MapReduce job from remote server using JobClient

Posted by Harsh J <ha...@cloudera.com>.

The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.

On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.



-- 
Harsh J

Re: Submitting MapReduce job from remote server using JobClient

Posted by Harsh J <ha...@cloudera.com>.

The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.

On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.



-- 
Harsh J

Re: Submitting MapReduce job from remote server using JobClient

Posted by Harsh J <ha...@cloudera.com>.

The Job class itself has a blocking and non-blocking submitter that is
similar to JobConf's runJob method you discovered. See
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit()
and its following method waitForCompletion(). These seem to be what
you're looking for.

On Thu, Jan 24, 2013 at 5:43 PM, Amit Sela <am...@infolinks.com> wrote:
> Hi all,
>
> I want to run a MapReduce job using the Hadoop Java api from my analytics
> server. It is not the master or even a data node but it has the same Hadoop
> installation as all the nodes in the cluster.
> I tried using JobClient.runJob() but it accepts JobConf as argument and when
> using JobConf it is possible to set only mapred Mapper classes and I use
> mapreduce...
> I tried using JobControl and ControlledJob but it seems like it tries to run
> the job locally. the map phase just keeps attempting...
> Anyone tried it before ?
> I'm just looking for a way to submit MapReduce jobs from Java code and be
> able to monitor them.
>
> Thanks,
>
> Amit.



-- 
Harsh J