You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Ashish Jain <as...@gmail.com> on 2014/01/08 12:34:46 UTC

Distributing the code to multiple nodes

Hello All,

I have a 2 node hadoop cluster running with a replication factor of 2. I
have a file of size around 1 GB which when copied to HDFS is replicated to
both the nodes. Seeing the block info I can see the file has been
subdivided into 8 parts which means it has been subdivided into 8 blocks
each of size 128 MB.  I use this file as input to run the word count
program. Some how I feel only one node is doing all the work and the code
is not distributed to other node. How can I make sure code is distributed
to both the nodes? Also is there a log or GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.

++Ashish

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

...And do all three nodes appear in the NameNode and YARN web user
interfaces?
Chris
On Jan 9, 2014 7:46 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>>  Chris
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>>
>>>>>>>> ++Ashish
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish


On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish


On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish


On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish


On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Resource manager trying allocate memory 2GB but it available 1GB.


On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Resource manager trying allocate memory 2GB but it available 1GB.


On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Resource manager trying allocate memory 2GB but it available 1GB.


On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

My execution is stuck at this position indefinitely:

[root@l1-dev06 bin]# ./hadoop jar /opt/ApacheHadoop/wordCount.jar
/opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/OUT56
14/01/15 19:35:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/01/15 19:35:14 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: number of splits:8
14/01/15 19:35:14 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.value.class
is deprecated. Instead, use mapreduce.job.output.value.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.key.class
is deprecated. Instead, use mapreduce.job.output.key.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1389794591210_0001
14/01/15 19:35:15 INFO impl.YarnClientImpl: Submitted application
application_1389794591210_0001 to ResourceManager at /10.12.11.210:1003
14/01/15 19:35:15 INFO mapreduce.Job: The url to track the job:
http://l1-dev06:8088/proxy/application_1389794591210_0001/
14/01/15 19:35:15 INFO mapreduce.Job: Running job: job_1389794591210_0001



On Wed, Jan 15, 2014 at 7:20 PM, Ashish Jain <as...@gmail.com> wrote:

> I just now tried it again and I see following messages popping up in the
> log file:
>
> 2014-01-15 19:37:38,221 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 19:37:38,621 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Do I need to increase the RAM allocation to slave nodes??
>
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

My execution is stuck at this position indefinitely:

[root@l1-dev06 bin]# ./hadoop jar /opt/ApacheHadoop/wordCount.jar
/opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/OUT56
14/01/15 19:35:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/01/15 19:35:14 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: number of splits:8
14/01/15 19:35:14 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.value.class
is deprecated. Instead, use mapreduce.job.output.value.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.key.class
is deprecated. Instead, use mapreduce.job.output.key.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1389794591210_0001
14/01/15 19:35:15 INFO impl.YarnClientImpl: Submitted application
application_1389794591210_0001 to ResourceManager at /10.12.11.210:1003
14/01/15 19:35:15 INFO mapreduce.Job: The url to track the job:
http://l1-dev06:8088/proxy/application_1389794591210_0001/
14/01/15 19:35:15 INFO mapreduce.Job: Running job: job_1389794591210_0001



On Wed, Jan 15, 2014 at 7:20 PM, Ashish Jain <as...@gmail.com> wrote:

> I just now tried it again and I see following messages popping up in the
> log file:
>
> 2014-01-15 19:37:38,221 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 19:37:38,621 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Do I need to increase the RAM allocation to slave nodes??
>
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

My execution is stuck at this position indefinitely:

[root@l1-dev06 bin]# ./hadoop jar /opt/ApacheHadoop/wordCount.jar
/opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/OUT56
14/01/15 19:35:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/01/15 19:35:14 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: number of splits:8
14/01/15 19:35:14 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.value.class
is deprecated. Instead, use mapreduce.job.output.value.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.key.class
is deprecated. Instead, use mapreduce.job.output.key.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1389794591210_0001
14/01/15 19:35:15 INFO impl.YarnClientImpl: Submitted application
application_1389794591210_0001 to ResourceManager at /10.12.11.210:1003
14/01/15 19:35:15 INFO mapreduce.Job: The url to track the job:
http://l1-dev06:8088/proxy/application_1389794591210_0001/
14/01/15 19:35:15 INFO mapreduce.Job: Running job: job_1389794591210_0001



On Wed, Jan 15, 2014 at 7:20 PM, Ashish Jain <as...@gmail.com> wrote:

> I just now tried it again and I see following messages popping up in the
> log file:
>
> 2014-01-15 19:37:38,221 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 19:37:38,621 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Do I need to increase the RAM allocation to slave nodes??
>
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

My execution is stuck at this position indefinitely:

[root@l1-dev06 bin]# ./hadoop jar /opt/ApacheHadoop/wordCount.jar
/opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/OUT56
14/01/15 19:35:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/01/15 19:35:14 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: number of splits:8
14/01/15 19:35:14 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.value.class
is deprecated. Instead, use mapreduce.job.output.value.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.key.class
is deprecated. Instead, use mapreduce.job.output.key.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1389794591210_0001
14/01/15 19:35:15 INFO impl.YarnClientImpl: Submitted application
application_1389794591210_0001 to ResourceManager at /10.12.11.210:1003
14/01/15 19:35:15 INFO mapreduce.Job: The url to track the job:
http://l1-dev06:8088/proxy/application_1389794591210_0001/
14/01/15 19:35:15 INFO mapreduce.Job: Running job: job_1389794591210_0001



On Wed, Jan 15, 2014 at 7:20 PM, Ashish Jain <as...@gmail.com> wrote:

> I just now tried it again and I see following messages popping up in the
> log file:
>
> 2014-01-15 19:37:38,221 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 19:37:38,621 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Do I need to increase the RAM allocation to slave nodes??
>
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I just now tried it again and I see following messages popping up in the
log file:

2014-01-15 19:37:38,221 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 19:37:38,621 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Do I need to increase the RAM allocation to slave nodes??



On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I just now tried it again and I see following messages popping up in the
log file:

2014-01-15 19:37:38,221 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 19:37:38,621 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Do I need to increase the RAM allocation to slave nodes??



On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Resource manager trying allocate memory 2GB but it available 1GB.


On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I just now tried it again and I see following messages popping up in the
log file:

2014-01-15 19:37:38,221 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 19:37:38,621 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Do I need to increase the RAM allocation to slave nodes??



On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I just now tried it again and I see following messages popping up in the
log file:

2014-01-15 19:37:38,221 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 19:37:38,621 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Do I need to increase the RAM allocation to slave nodes??



On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <as...@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I tried that but somehow my map reduce jobs do not execute at all once I
set it to yarn


On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  Surely you don’t have to set **mapreduce.jobtracker.address** in
> mapred-site.xml
>
>
>
> In mapred-site.xml you just have to mention:
>
> <property>
>
> <name>mapreduce.framework.name</name>
>
> <value>yarn</value>
>
> </property>
>
>
>
> -Nirmal
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Wednesday, January 15, 2014 6:44 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> I think this is the problem. I have not set "mapreduce.jobtracker.address"
> in my mapred-site.xml and by default it is set to local. Now the question
> is how to set it up to remote. Documentation says I need to specify the
> host:port of the job tracker for this. As we know hadoop 2.2.0 is
> completely overhauled and there is no concept of task tracker and job
> tracker. Instead there is now resource manager and node manager. So in this
> case what do I set as "mapreduce.jobtracker.address". Do I set is
> resourceMangerHost:resourceMangerPort?
>
> --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>   --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>   Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>     Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
>
> 2) Run the example again using the command
>
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>   I also tried the following and it complains of filenotfound exception
> and some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
>
>
> *ID *
>
> *User *
>
> *Name *
>
> *Application Type *
>
> *Queue *
>
> *StartTime *
>
> *FinishTime *
>
> *State *
>
> *FinalStatus *
>
> *Progress *
>
> *Tracking UI *
>
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>
> root
>
> wordcount
>
> MAPREDUCE
>
> default
>
> Wed, 15 Jan 2014 07:52:04 GMT
>
> N/A
>
> ACCEPTED
>
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>
>
>
> Please advice what should I do
>
> --Ashish
>
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
>
> like
>
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>   German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>  Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I tried that but somehow my map reduce jobs do not execute at all once I
set it to yarn


On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  Surely you don’t have to set **mapreduce.jobtracker.address** in
> mapred-site.xml
>
>
>
> In mapred-site.xml you just have to mention:
>
> <property>
>
> <name>mapreduce.framework.name</name>
>
> <value>yarn</value>
>
> </property>
>
>
>
> -Nirmal
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Wednesday, January 15, 2014 6:44 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> I think this is the problem. I have not set "mapreduce.jobtracker.address"
> in my mapred-site.xml and by default it is set to local. Now the question
> is how to set it up to remote. Documentation says I need to specify the
> host:port of the job tracker for this. As we know hadoop 2.2.0 is
> completely overhauled and there is no concept of task tracker and job
> tracker. Instead there is now resource manager and node manager. So in this
> case what do I set as "mapreduce.jobtracker.address". Do I set is
> resourceMangerHost:resourceMangerPort?
>
> --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>   --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>   Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>     Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
>
> 2) Run the example again using the command
>
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>   I also tried the following and it complains of filenotfound exception
> and some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
>
>
> *ID *
>
> *User *
>
> *Name *
>
> *Application Type *
>
> *Queue *
>
> *StartTime *
>
> *FinishTime *
>
> *State *
>
> *FinalStatus *
>
> *Progress *
>
> *Tracking UI *
>
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>
> root
>
> wordcount
>
> MAPREDUCE
>
> default
>
> Wed, 15 Jan 2014 07:52:04 GMT
>
> N/A
>
> ACCEPTED
>
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>
>
>
> Please advice what should I do
>
> --Ashish
>
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
>
> like
>
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>   German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>  Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I tried that but somehow my map reduce jobs do not execute at all once I
set it to yarn


On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  Surely you don’t have to set **mapreduce.jobtracker.address** in
> mapred-site.xml
>
>
>
> In mapred-site.xml you just have to mention:
>
> <property>
>
> <name>mapreduce.framework.name</name>
>
> <value>yarn</value>
>
> </property>
>
>
>
> -Nirmal
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Wednesday, January 15, 2014 6:44 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> I think this is the problem. I have not set "mapreduce.jobtracker.address"
> in my mapred-site.xml and by default it is set to local. Now the question
> is how to set it up to remote. Documentation says I need to specify the
> host:port of the job tracker for this. As we know hadoop 2.2.0 is
> completely overhauled and there is no concept of task tracker and job
> tracker. Instead there is now resource manager and node manager. So in this
> case what do I set as "mapreduce.jobtracker.address". Do I set is
> resourceMangerHost:resourceMangerPort?
>
> --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>   --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>   Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>     Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
>
> 2) Run the example again using the command
>
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>   I also tried the following and it complains of filenotfound exception
> and some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
>
>
> *ID *
>
> *User *
>
> *Name *
>
> *Application Type *
>
> *Queue *
>
> *StartTime *
>
> *FinishTime *
>
> *State *
>
> *FinalStatus *
>
> *Progress *
>
> *Tracking UI *
>
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>
> root
>
> wordcount
>
> MAPREDUCE
>
> default
>
> Wed, 15 Jan 2014 07:52:04 GMT
>
> N/A
>
> ACCEPTED
>
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>
>
>
> Please advice what should I do
>
> --Ashish
>
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
>
> like
>
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>   German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>  Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I tried that but somehow my map reduce jobs do not execute at all once I
set it to yarn


On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  Surely you don’t have to set **mapreduce.jobtracker.address** in
> mapred-site.xml
>
>
>
> In mapred-site.xml you just have to mention:
>
> <property>
>
> <name>mapreduce.framework.name</name>
>
> <value>yarn</value>
>
> </property>
>
>
>
> -Nirmal
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Wednesday, January 15, 2014 6:44 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> I think this is the problem. I have not set "mapreduce.jobtracker.address"
> in my mapred-site.xml and by default it is set to local. Now the question
> is how to set it up to remote. Documentation says I need to specify the
> host:port of the job tracker for this. As we know hadoop 2.2.0 is
> completely overhauled and there is no concept of task tracker and job
> tracker. Instead there is now resource manager and node manager. So in this
> case what do I set as "mapreduce.jobtracker.address". Do I set is
> resourceMangerHost:resourceMangerPort?
>
> --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>   --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>   Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>     Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
>
> 2) Run the example again using the command
>
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>   I also tried the following and it complains of filenotfound exception
> and some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
>
>
> *ID *
>
> *User *
>
> *Name *
>
> *Application Type *
>
> *Queue *
>
> *StartTime *
>
> *FinishTime *
>
> *State *
>
> *FinalStatus *
>
> *Progress *
>
> *Tracking UI *
>
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>
> root
>
> wordcount
>
> MAPREDUCE
>
> default
>
> Wed, 15 Jan 2014 07:52:04 GMT
>
> N/A
>
> ACCEPTED
>
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>
>
>
> Please advice what should I do
>
> --Ashish
>
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>
> wrote:
>
>   Hello Ashish
>
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
>
> like
>
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>   German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>  Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>  Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

RE: Distributing the code to multiple nodes

Posted by Nirmal Kumar <ni...@impetus.co.in>.

Surely you don't have to set *mapreduce.jobtracker.address* in mapred-site.xml

In mapred-site.xml you just have to mention:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

-Nirmal
From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Wednesday, January 15, 2014 6:44 PM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

I think this is the problem. I have not set "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is set to local. Now the question is how to set it up to remote. Documentation says I need to specify the host:port of the job tracker for this. As we know hadoop 2.2.0 is completely overhauled and there is no concept of task tracker and job tracker. Instead there is now resource manager and node manager. So in this case what do I set as "mapreduce.jobtracker.address". Do I set is resourceMangerHost:resourceMangerPort?
--Ashish

On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com>> wrote:
Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
Next I killed the datanode in 10.12.11.210 and l see the following messages in the log files. Looks like the namenode is still trying to assign the complete task to one single node and since it does not find the complete data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,348 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,871 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,897 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,349 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,874 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,900 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>

--Ashish

On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish


2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/

Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
One more thing try , just stop datanode process in  10.12.11.210 and run the job


On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Sudhakara,
Thanks for your suggestion. However once I change the mapreduce framework to yarn my map reduce jobs does not get executed at all. It seems it is waiting on some thread indefinitely. Here is what I have done
1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name<http://mapreduce.framework.name></name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
The jobs are just stuck and do not move further.

I also tried the following and it complains of filenotfound exception and some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log<file:///\\opt\ApacheHadoop\temp\worker.log> file:///opt/ApacheHadoop/out/<file:///\\opt\ApacheHadoop\out\>
Below is the status of the job from hadoop application console. The progress bar does not move at all.

ID

User

Name

Application Type

Queue

StartTime

FinishTime

State

FinalStatus

Progress

Tracking UI

application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>

root

wordcount

MAPREDUCE

default

Wed, 15 Jan 2014 07:52:04 GMT

N/A

ACCEPTED

UNDEFINED

UNASSIGNE<http://10.12.11.210:8088/cluster/apps>


Please advice what should I do
--Ashish

On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the Local file system. Can you try by give the full URI path of the input and output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name<http://Dmapreduce.framework.name>=yarn file:///home/input/<file:///\\home\input\>  file:///home/output/<file:///\\home\output\>

On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>> wrote:
German,

This does not seem to be helping. I tried to use the Fairscheduler as my resource manger but the behavior remains same. I could see the fairscheduler log getting continuous heart beat from both the other nodes. But it is still not distributing the work to other nodes. What I did next was started 3 jobs simultaneously so that may be some part of one of the job be distributed to other nodes. However still only one node is being used :(((. What is that is going wrong can some one help?
Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741867:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741868:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741869:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741870:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741871:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741872:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741873:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info

Someone please advice on how to go about this.
--Ashish

On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>> wrote:
Thanks for all these suggestions. Somehow I do not have access to the servers today and will try the suggestions made on monday and will let you know how it goes.
--Ashish

On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <ge...@samsung.com>> wrote:
Ashish
Could this be related to the scheduler you are using and its settings?.

On lab environments when running a single type of job I often use FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a good job distributing the load.

You could give that a try (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

I think just changing yarn-site.xml  as follows could demonstrate this theory (note that  how the jobs are scheduled depend on resources such as memory on the nodes and you would need to setup yarn-site.xml accordingly).

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Regards
./g


From: Ashish Jain [mailto:ashjain2@gmail.com<ma...@gmail.com>]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributing the code to multiple nodes

Another point to add here 10.12.11.210 is the host which has everything running including a slave datanode. Data was also distributed this host as well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>> wrote:
Logs were updated only when I copied the data. After copying the data there has been no updates on the log files.

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>> wrote:

Do the logs on the three nodes contain anything interesting?
Chris
On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Here is the block info for the record I distributed. As can be seen only 10.12.11.210 has all the data and this is the node which is serving all the request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741858:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741859:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741860:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741861:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741862:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741863:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741864:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info








--Ashish

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Chris,
I have now a cluster with 3 nodes and replication factor being 2. When I distribute a file I could see that there are replica of data available in other nodes. However when I run a map reduce job again only one node is serving all the request :(. Can you or anyone please provide some more inputs.
Thanks
Ashish

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>> wrote:

2 nodes and replication factor of 2 results in a replica of each block present on each node. This would allow the possibility that a single node would do the work and yet be data local.  It will probably happen if that single node has the needed capacity.  More nodes than the replication factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Guys,
I am sure that only one node is being used. I just know ran the job again and could see that CPU usage only for one server going high other server CPU usage remains constant and hence it means other node is not being used. Can someone help me to debug this issue?
++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello All,
I have a 2 node hadoop cluster running with a replication factor of 2. I have a file of size around 1 GB which when copied to HDFS is replicated to both the nodes. Seeing the block info I can see the file has been subdivided into 8 parts which means it has been subdivided into 8 blocks each of size 128 MB.  I use this file as input to run the word count program. Some how I feel only one node is doing all the work and the code is not distributed to other node. How can I make sure code is distributed to both the nodes? Also is there a log or GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.
++Ashish









--

Regards,
...Sudhakara.st




--

Regards,
...Sudhakara.st




________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Distributing the code to multiple nodes

Posted by Nirmal Kumar <ni...@impetus.co.in>.

Surely you don't have to set *mapreduce.jobtracker.address* in mapred-site.xml

In mapred-site.xml you just have to mention:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

-Nirmal
From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Wednesday, January 15, 2014 6:44 PM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

I think this is the problem. I have not set "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is set to local. Now the question is how to set it up to remote. Documentation says I need to specify the host:port of the job tracker for this. As we know hadoop 2.2.0 is completely overhauled and there is no concept of task tracker and job tracker. Instead there is now resource manager and node manager. So in this case what do I set as "mapreduce.jobtracker.address". Do I set is resourceMangerHost:resourceMangerPort?
--Ashish

On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com>> wrote:
Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
Next I killed the datanode in 10.12.11.210 and l see the following messages in the log files. Looks like the namenode is still trying to assign the complete task to one single node and since it does not find the complete data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,348 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,871 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,897 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,349 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,874 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,900 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>

--Ashish

On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish


2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/

Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
One more thing try , just stop datanode process in  10.12.11.210 and run the job


On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Sudhakara,
Thanks for your suggestion. However once I change the mapreduce framework to yarn my map reduce jobs does not get executed at all. It seems it is waiting on some thread indefinitely. Here is what I have done
1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name<http://mapreduce.framework.name></name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
The jobs are just stuck and do not move further.

I also tried the following and it complains of filenotfound exception and some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log<file:///\\opt\ApacheHadoop\temp\worker.log> file:///opt/ApacheHadoop/out/<file:///\\opt\ApacheHadoop\out\>
Below is the status of the job from hadoop application console. The progress bar does not move at all.

ID

User

Name

Application Type

Queue

StartTime

FinishTime

State

FinalStatus

Progress

Tracking UI

application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>

root

wordcount

MAPREDUCE

default

Wed, 15 Jan 2014 07:52:04 GMT

N/A

ACCEPTED

UNDEFINED

UNASSIGNE<http://10.12.11.210:8088/cluster/apps>


Please advice what should I do
--Ashish

On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the Local file system. Can you try by give the full URI path of the input and output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name<http://Dmapreduce.framework.name>=yarn file:///home/input/<file:///\\home\input\>  file:///home/output/<file:///\\home\output\>

On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>> wrote:
German,

This does not seem to be helping. I tried to use the Fairscheduler as my resource manger but the behavior remains same. I could see the fairscheduler log getting continuous heart beat from both the other nodes. But it is still not distributing the work to other nodes. What I did next was started 3 jobs simultaneously so that may be some part of one of the job be distributed to other nodes. However still only one node is being used :(((. What is that is going wrong can some one help?
Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741867:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741868:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741869:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741870:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741871:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741872:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741873:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info

Someone please advice on how to go about this.
--Ashish

On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>> wrote:
Thanks for all these suggestions. Somehow I do not have access to the servers today and will try the suggestions made on monday and will let you know how it goes.
--Ashish

On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <ge...@samsung.com>> wrote:
Ashish
Could this be related to the scheduler you are using and its settings?.

On lab environments when running a single type of job I often use FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a good job distributing the load.

You could give that a try (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

I think just changing yarn-site.xml  as follows could demonstrate this theory (note that  how the jobs are scheduled depend on resources such as memory on the nodes and you would need to setup yarn-site.xml accordingly).

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Regards
./g


From: Ashish Jain [mailto:ashjain2@gmail.com<ma...@gmail.com>]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributing the code to multiple nodes

Another point to add here 10.12.11.210 is the host which has everything running including a slave datanode. Data was also distributed this host as well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>> wrote:
Logs were updated only when I copied the data. After copying the data there has been no updates on the log files.

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>> wrote:

Do the logs on the three nodes contain anything interesting?
Chris
On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Here is the block info for the record I distributed. As can be seen only 10.12.11.210 has all the data and this is the node which is serving all the request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741858:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741859:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741860:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741861:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741862:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741863:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741864:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info








--Ashish

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Chris,
I have now a cluster with 3 nodes and replication factor being 2. When I distribute a file I could see that there are replica of data available in other nodes. However when I run a map reduce job again only one node is serving all the request :(. Can you or anyone please provide some more inputs.
Thanks
Ashish

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>> wrote:

2 nodes and replication factor of 2 results in a replica of each block present on each node. This would allow the possibility that a single node would do the work and yet be data local.  It will probably happen if that single node has the needed capacity.  More nodes than the replication factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Guys,
I am sure that only one node is being used. I just know ran the job again and could see that CPU usage only for one server going high other server CPU usage remains constant and hence it means other node is not being used. Can someone help me to debug this issue?
++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello All,
I have a 2 node hadoop cluster running with a replication factor of 2. I have a file of size around 1 GB which when copied to HDFS is replicated to both the nodes. Seeing the block info I can see the file has been subdivided into 8 parts which means it has been subdivided into 8 blocks each of size 128 MB.  I use this file as input to run the word count program. Some how I feel only one node is doing all the work and the code is not distributed to other node. How can I make sure code is distributed to both the nodes? Also is there a log or GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.
++Ashish









--

Regards,
...Sudhakara.st




--

Regards,
...Sudhakara.st




________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Distributing the code to multiple nodes

Posted by Nirmal Kumar <ni...@impetus.co.in>.

Surely you don't have to set *mapreduce.jobtracker.address* in mapred-site.xml

In mapred-site.xml you just have to mention:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

-Nirmal
From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Wednesday, January 15, 2014 6:44 PM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

I think this is the problem. I have not set "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is set to local. Now the question is how to set it up to remote. Documentation says I need to specify the host:port of the job tracker for this. As we know hadoop 2.2.0 is completely overhauled and there is no concept of task tracker and job tracker. Instead there is now resource manager and node manager. So in this case what do I set as "mapreduce.jobtracker.address". Do I set is resourceMangerHost:resourceMangerPort?
--Ashish

On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com>> wrote:
Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
Next I killed the datanode in 10.12.11.210 and l see the following messages in the log files. Looks like the namenode is still trying to assign the complete task to one single node and since it does not find the complete data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,348 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,871 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,897 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,349 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,874 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,900 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>

--Ashish

On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish


2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/

Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
One more thing try , just stop datanode process in  10.12.11.210 and run the job


On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Sudhakara,
Thanks for your suggestion. However once I change the mapreduce framework to yarn my map reduce jobs does not get executed at all. It seems it is waiting on some thread indefinitely. Here is what I have done
1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name<http://mapreduce.framework.name></name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
The jobs are just stuck and do not move further.

I also tried the following and it complains of filenotfound exception and some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log<file:///\\opt\ApacheHadoop\temp\worker.log> file:///opt/ApacheHadoop/out/<file:///\\opt\ApacheHadoop\out\>
Below is the status of the job from hadoop application console. The progress bar does not move at all.

ID

User

Name

Application Type

Queue

StartTime

FinishTime

State

FinalStatus

Progress

Tracking UI

application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>

root

wordcount

MAPREDUCE

default

Wed, 15 Jan 2014 07:52:04 GMT

N/A

ACCEPTED

UNDEFINED

UNASSIGNE<http://10.12.11.210:8088/cluster/apps>


Please advice what should I do
--Ashish

On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the Local file system. Can you try by give the full URI path of the input and output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name<http://Dmapreduce.framework.name>=yarn file:///home/input/<file:///\\home\input\>  file:///home/output/<file:///\\home\output\>

On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>> wrote:
German,

This does not seem to be helping. I tried to use the Fairscheduler as my resource manger but the behavior remains same. I could see the fairscheduler log getting continuous heart beat from both the other nodes. But it is still not distributing the work to other nodes. What I did next was started 3 jobs simultaneously so that may be some part of one of the job be distributed to other nodes. However still only one node is being used :(((. What is that is going wrong can some one help?
Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741867:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741868:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741869:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741870:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741871:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741872:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741873:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info

Someone please advice on how to go about this.
--Ashish

On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>> wrote:
Thanks for all these suggestions. Somehow I do not have access to the servers today and will try the suggestions made on monday and will let you know how it goes.
--Ashish

On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <ge...@samsung.com>> wrote:
Ashish
Could this be related to the scheduler you are using and its settings?.

On lab environments when running a single type of job I often use FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a good job distributing the load.

You could give that a try (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

I think just changing yarn-site.xml  as follows could demonstrate this theory (note that  how the jobs are scheduled depend on resources such as memory on the nodes and you would need to setup yarn-site.xml accordingly).

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Regards
./g


From: Ashish Jain [mailto:ashjain2@gmail.com<ma...@gmail.com>]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributing the code to multiple nodes

Another point to add here 10.12.11.210 is the host which has everything running including a slave datanode. Data was also distributed this host as well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>> wrote:
Logs were updated only when I copied the data. After copying the data there has been no updates on the log files.

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>> wrote:

Do the logs on the three nodes contain anything interesting?
Chris
On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Here is the block info for the record I distributed. As can be seen only 10.12.11.210 has all the data and this is the node which is serving all the request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741858:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741859:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741860:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741861:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741862:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741863:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741864:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info








--Ashish

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Chris,
I have now a cluster with 3 nodes and replication factor being 2. When I distribute a file I could see that there are replica of data available in other nodes. However when I run a map reduce job again only one node is serving all the request :(. Can you or anyone please provide some more inputs.
Thanks
Ashish

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>> wrote:

2 nodes and replication factor of 2 results in a replica of each block present on each node. This would allow the possibility that a single node would do the work and yet be data local.  It will probably happen if that single node has the needed capacity.  More nodes than the replication factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Guys,
I am sure that only one node is being used. I just know ran the job again and could see that CPU usage only for one server going high other server CPU usage remains constant and hence it means other node is not being used. Can someone help me to debug this issue?
++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello All,
I have a 2 node hadoop cluster running with a replication factor of 2. I have a file of size around 1 GB which when copied to HDFS is replicated to both the nodes. Seeing the block info I can see the file has been subdivided into 8 parts which means it has been subdivided into 8 blocks each of size 128 MB.  I use this file as input to run the word count program. Some how I feel only one node is doing all the work and the code is not distributed to other node. How can I make sure code is distributed to both the nodes? Also is there a log or GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.
++Ashish









--

Regards,
...Sudhakara.st




--

Regards,
...Sudhakara.st




________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Distributing the code to multiple nodes

Posted by Nirmal Kumar <ni...@impetus.co.in>.

Surely you don't have to set *mapreduce.jobtracker.address* in mapred-site.xml

In mapred-site.xml you just have to mention:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

-Nirmal
From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Wednesday, January 15, 2014 6:44 PM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

I think this is the problem. I have not set "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is set to local. Now the question is how to set it up to remote. Documentation says I need to specify the host:port of the job tracker for this. As we know hadoop 2.2.0 is completely overhauled and there is no concept of task tracker and job tracker. Instead there is now resource manager and node manager. So in this case what do I set as "mapreduce.jobtracker.address". Do I set is resourceMangerHost:resourceMangerPort?
--Ashish

On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com>> wrote:
Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
Next I killed the datanode in 10.12.11.210 and l see the following messages in the log files. Looks like the namenode is still trying to assign the complete task to one single node and since it does not find the complete data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,348 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,871 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,897 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,349 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1dev-211:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,874 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,900 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax Locality: true} node total capability : <memory:1024, vCores:8>

--Ashish

On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish


2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/

Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
One more thing try , just stop datanode process in  10.12.11.210 and run the job


On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Sudhakara,
Thanks for your suggestion. However once I change the mapreduce framework to yarn my map reduce jobs does not get executed at all. It seems it is waiting on some thread indefinitely. Here is what I have done
1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name<http://mapreduce.framework.name></name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/
The jobs are just stuck and do not move further.

I also tried the following and it complains of filenotfound exception and some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log<file:///\\opt\ApacheHadoop\temp\worker.log> file:///opt/ApacheHadoop/out/<file:///\\opt\ApacheHadoop\out\>
Below is the status of the job from hadoop application console. The progress bar does not move at all.

ID

User

Name

Application Type

Queue

StartTime

FinishTime

State

FinalStatus

Progress

Tracking UI

application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>

root

wordcount

MAPREDUCE

default

Wed, 15 Jan 2014 07:52:04 GMT

N/A

ACCEPTED

UNDEFINED

UNASSIGNE<http://10.12.11.210:8088/cluster/apps>


Please advice what should I do
--Ashish

On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>> wrote:
Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the Local file system. Can you try by give the full URI path of the input and output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name<http://Dmapreduce.framework.name>=yarn file:///home/input/<file:///\\home\input\>  file:///home/output/<file:///\\home\output\>

On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>> wrote:
German,

This does not seem to be helping. I tried to use the Fairscheduler as my resource manger but the behavior remains same. I could see the fairscheduler log getting continuous heart beat from both the other nodes. But it is still not distributing the work to other nodes. What I did next was started 3 jobs simultaneously so that may be some part of one of the job be distributed to other nodes. However still only one node is being used :(((. What is that is going wrong can some one help?
Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741867:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741868:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741869:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741870:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741871:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741872:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741873:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info

Someone please advice on how to go about this.
--Ashish

On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>> wrote:
Thanks for all these suggestions. Somehow I do not have access to the servers today and will try the suggestions made on monday and will let you know how it goes.
--Ashish

On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <ge...@samsung.com>> wrote:
Ashish
Could this be related to the scheduler you are using and its settings?.

On lab environments when running a single type of job I often use FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a good job distributing the load.

You could give that a try (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

I think just changing yarn-site.xml  as follows could demonstrate this theory (note that  how the jobs are scheduled depend on resources such as memory on the nodes and you would need to setup yarn-site.xml accordingly).

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Regards
./g


From: Ashish Jain [mailto:ashjain2@gmail.com<ma...@gmail.com>]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Distributing the code to multiple nodes

Another point to add here 10.12.11.210 is the host which has everything running including a slave datanode. Data was also distributed this host as well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>> wrote:
Logs were updated only when I copied the data. After copying the data there has been no updates on the log files.

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>> wrote:

Do the logs on the three nodes contain anything interesting?
Chris
On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Here is the block info for the record I distributed. As can be seen only 10.12.11.210 has all the data and this is the node which is serving all the request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741858:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741859:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741860:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741861:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741862:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741863:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741864:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info








--Ashish

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello Chris,
I have now a cluster with 3 nodes and replication factor being 2. When I distribute a file I could see that there are replica of data available in other nodes. However when I run a map reduce job again only one node is serving all the request :(. Can you or anyone please provide some more inputs.
Thanks
Ashish

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>> wrote:

2 nodes and replication factor of 2 results in a replica of each block present on each node. This would allow the possibility that a single node would do the work and yet be data local.  It will probably happen if that single node has the needed capacity.  More nodes than the replication factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com>> wrote:
Guys,
I am sure that only one node is being used. I just know ran the job again and could see that CPU usage only for one server going high other server CPU usage remains constant and hence it means other node is not being used. Can someone help me to debug this issue?
++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>> wrote:
Hello All,
I have a 2 node hadoop cluster running with a replication factor of 2. I have a file of size around 1 GB which when copied to HDFS is replicated to both the nodes. Seeing the block info I can see the file has been subdivided into 8 parts which means it has been subdivided into 8 blocks each of size 128 MB.  I use this file as input to run the word count program. Some how I feel only one node is doing all the work and the code is not distributed to other node. How can I make sure code is distributed to both the nodes? Also is there a log or GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.
++Ashish









--

Regards,
...Sudhakara.st




--

Regards,
...Sudhakara.st




________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I think this is the problem. I have not set "mapreduce.jobtracker.address"
in my mapred-site.xml and by default it is set to local. Now the question
is how to set it up to remote. Documentation says I need to specify the
host:port of the job tracker for this. As we know hadoop 2.2.0 is
completely overhauled and there is no concept of task tracker and job
tracker. Instead there is now resource manager and node manager. So in this
case what do I set as "mapreduce.jobtracker.address". Do I set is
resourceMangerHost:resourceMangerPort?

--Ashish


On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:

> Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>
> --Ashish
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>
>> Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>
>>> I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>> ID
>>> User
>>> Name
>>> Application Type
>>> Queue
>>> StartTime
>>> FinishTime
>>> State
>>> FinalStatus
>>> Progress
>>> Tracking UI
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>>
>>>> Hello Ashish
>>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>>> the Local file system. Can you try by give the full URI path of the input
>>>> and output path.
>>>>  like
>>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>>> file:///home/input/  file:///home/output/
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> German,
>>>>>
>>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>>> my resource manger but the behavior remains same. I could see the
>>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>>> But it is still not distributing the work to other nodes. What I did next
>>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>>> job be distributed to other nodes. However still only one node is being
>>>>> used :(((. What is that is going wrong can some one help?
>>>>>
>>>>> Sample of fairsheduler log:
>>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>>
>>>>> My Data distributed as blocks to other nodes. The host with IP
>>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>>> request.
>>>>>
>>>>> Total number of blocks: 8
>>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> Someone please advice on how to go about this.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>>> servers today and will try the suggestions made on monday and will let you
>>>>>> know how it goes.
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>>> german.fl@samsung.com> wrote:
>>>>>>
>>>>>>> Ashish
>>>>>>>
>>>>>>> Could this be related to the scheduler you are using and its
>>>>>>> settings?.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On lab environments when running a single type of job I often use
>>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>>> a good job distributing the load.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could give that a try (
>>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>>> accordingly).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>>
>>>>>>>
>>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ./g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>>
>>>>>>> 7966 DataNode
>>>>>>> 8480 NodeManager
>>>>>>> 8353 ResourceManager
>>>>>>> 8141 SecondaryNameNode
>>>>>>> 7834 NameNode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Logs were updated only when I copied the data. After copying the
>>>>>>> data there has been no updates on the log files.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>>
>>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>>
>>>>>>> --Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Chris,
>>>>>>>
>>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>>> When I distribute a file I could see that there are replica of data
>>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>>> some more inputs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>>> block present on each node. This would allow the possibility that a single
>>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>>> factor are needed to force distribution of the processing.
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> ...Sudhakara.st
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I think this is the problem. I have not set "mapreduce.jobtracker.address"
in my mapred-site.xml and by default it is set to local. Now the question
is how to set it up to remote. Documentation says I need to specify the
host:port of the job tracker for this. As we know hadoop 2.2.0 is
completely overhauled and there is no concept of task tracker and job
tracker. Instead there is now resource manager and node manager. So in this
case what do I set as "mapreduce.jobtracker.address". Do I set is
resourceMangerHost:resourceMangerPort?

--Ashish


On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:

> Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>
> --Ashish
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>
>> Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>
>>> I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>> ID
>>> User
>>> Name
>>> Application Type
>>> Queue
>>> StartTime
>>> FinishTime
>>> State
>>> FinalStatus
>>> Progress
>>> Tracking UI
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>>
>>>> Hello Ashish
>>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>>> the Local file system. Can you try by give the full URI path of the input
>>>> and output path.
>>>>  like
>>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>>> file:///home/input/  file:///home/output/
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> German,
>>>>>
>>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>>> my resource manger but the behavior remains same. I could see the
>>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>>> But it is still not distributing the work to other nodes. What I did next
>>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>>> job be distributed to other nodes. However still only one node is being
>>>>> used :(((. What is that is going wrong can some one help?
>>>>>
>>>>> Sample of fairsheduler log:
>>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>>
>>>>> My Data distributed as blocks to other nodes. The host with IP
>>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>>> request.
>>>>>
>>>>> Total number of blocks: 8
>>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> Someone please advice on how to go about this.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>>> servers today and will try the suggestions made on monday and will let you
>>>>>> know how it goes.
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>>> german.fl@samsung.com> wrote:
>>>>>>
>>>>>>> Ashish
>>>>>>>
>>>>>>> Could this be related to the scheduler you are using and its
>>>>>>> settings?.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On lab environments when running a single type of job I often use
>>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>>> a good job distributing the load.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could give that a try (
>>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>>> accordingly).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>>
>>>>>>>
>>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ./g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>>
>>>>>>> 7966 DataNode
>>>>>>> 8480 NodeManager
>>>>>>> 8353 ResourceManager
>>>>>>> 8141 SecondaryNameNode
>>>>>>> 7834 NameNode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Logs were updated only when I copied the data. After copying the
>>>>>>> data there has been no updates on the log files.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>>
>>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>>
>>>>>>> --Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Chris,
>>>>>>>
>>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>>> When I distribute a file I could see that there are replica of data
>>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>>> some more inputs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>>> block present on each node. This would allow the possibility that a single
>>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>>> factor are needed to force distribution of the processing.
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> ...Sudhakara.st
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I think this is the problem. I have not set "mapreduce.jobtracker.address"
in my mapred-site.xml and by default it is set to local. Now the question
is how to set it up to remote. Documentation says I need to specify the
host:port of the job tracker for this. As we know hadoop 2.2.0 is
completely overhauled and there is no concept of task tracker and job
tracker. Instead there is now resource manager and node manager. So in this
case what do I set as "mapreduce.jobtracker.address". Do I set is
resourceMangerHost:resourceMangerPort?

--Ashish


On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:

> Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>
> --Ashish
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>
>> Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>
>>> I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>> ID
>>> User
>>> Name
>>> Application Type
>>> Queue
>>> StartTime
>>> FinishTime
>>> State
>>> FinalStatus
>>> Progress
>>> Tracking UI
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>>
>>>> Hello Ashish
>>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>>> the Local file system. Can you try by give the full URI path of the input
>>>> and output path.
>>>>  like
>>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>>> file:///home/input/  file:///home/output/
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> German,
>>>>>
>>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>>> my resource manger but the behavior remains same. I could see the
>>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>>> But it is still not distributing the work to other nodes. What I did next
>>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>>> job be distributed to other nodes. However still only one node is being
>>>>> used :(((. What is that is going wrong can some one help?
>>>>>
>>>>> Sample of fairsheduler log:
>>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>>
>>>>> My Data distributed as blocks to other nodes. The host with IP
>>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>>> request.
>>>>>
>>>>> Total number of blocks: 8
>>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> Someone please advice on how to go about this.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>>> servers today and will try the suggestions made on monday and will let you
>>>>>> know how it goes.
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>>> german.fl@samsung.com> wrote:
>>>>>>
>>>>>>> Ashish
>>>>>>>
>>>>>>> Could this be related to the scheduler you are using and its
>>>>>>> settings?.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On lab environments when running a single type of job I often use
>>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>>> a good job distributing the load.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could give that a try (
>>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>>> accordingly).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>>
>>>>>>>
>>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ./g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>>
>>>>>>> 7966 DataNode
>>>>>>> 8480 NodeManager
>>>>>>> 8353 ResourceManager
>>>>>>> 8141 SecondaryNameNode
>>>>>>> 7834 NameNode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Logs were updated only when I copied the data. After copying the
>>>>>>> data there has been no updates on the log files.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>>
>>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>>
>>>>>>> --Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Chris,
>>>>>>>
>>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>>> When I distribute a file I could see that there are replica of data
>>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>>> some more inputs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>>> block present on each node. This would allow the possibility that a single
>>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>>> factor are needed to force distribution of the processing.
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> ...Sudhakara.st
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

I think this is the problem. I have not set "mapreduce.jobtracker.address"
in my mapred-site.xml and by default it is set to local. Now the question
is how to set it up to remote. Documentation says I need to specify the
host:port of the job tracker for this. As we know hadoop 2.2.0 is
completely overhauled and there is no concept of task tracker and job
tracker. Instead there is now resource manager and node manager. So in this
case what do I set as "mapreduce.jobtracker.address". Do I set is
resourceMangerHost:resourceMangerPort?

--Ashish


On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <as...@gmail.com> wrote:

> Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>
> --Ashish
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>
>> Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>
>>> I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>> ID
>>> User
>>> Name
>>> Application Type
>>> Queue
>>> StartTime
>>> FinishTime
>>> State
>>> FinalStatus
>>> Progress
>>> Tracking UI
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>>
>>>> Hello Ashish
>>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>>> the Local file system. Can you try by give the full URI path of the input
>>>> and output path.
>>>>  like
>>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>>> file:///home/input/  file:///home/output/
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> German,
>>>>>
>>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>>> my resource manger but the behavior remains same. I could see the
>>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>>> But it is still not distributing the work to other nodes. What I did next
>>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>>> job be distributed to other nodes. However still only one node is being
>>>>> used :(((. What is that is going wrong can some one help?
>>>>>
>>>>> Sample of fairsheduler log:
>>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>>
>>>>> My Data distributed as blocks to other nodes. The host with IP
>>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>>> request.
>>>>>
>>>>> Total number of blocks: 8
>>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> Someone please advice on how to go about this.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>>> servers today and will try the suggestions made on monday and will let you
>>>>>> know how it goes.
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>>> german.fl@samsung.com> wrote:
>>>>>>
>>>>>>> Ashish
>>>>>>>
>>>>>>> Could this be related to the scheduler you are using and its
>>>>>>> settings?.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On lab environments when running a single type of job I often use
>>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>>> a good job distributing the load.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could give that a try (
>>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>>> accordingly).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>>
>>>>>>>
>>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ./g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>>
>>>>>>> 7966 DataNode
>>>>>>> 8480 NodeManager
>>>>>>> 8353 ResourceManager
>>>>>>> 8141 SecondaryNameNode
>>>>>>> 7834 NameNode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Logs were updated only when I copied the data. After copying the
>>>>>>> data there has been no updates on the log files.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>>
>>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>>
>>>>>>> --Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Chris,
>>>>>>>
>>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>>> When I distribute a file I could see that there are replica of data
>>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>>> some more inputs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>>> block present on each node. This would allow the possibility that a single
>>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>>> factor are needed to force distribution of the processing.
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> ...Sudhakara.st
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main
class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

Next I killed the datanode in 10.12.11.210 and l see the following messages
in the log files. Looks like the namenode is still trying to assign the
complete task to one single node and since it does not find the complete
data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,348 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,871 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,897 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,349 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,874 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,900 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>


--Ashish


On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>
> Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>
>> I also tried the following and it complains of filenotfound exception and
>> some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>> ID
>> User
>> Name
>> Application Type
>> Queue
>> StartTime
>> FinishTime
>> State
>> FinalStatus
>> Progress
>> Tracking UI
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>
>>> Hello Ashish
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>  like
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> German,
>>>>
>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>> my resource manger but the behavior remains same. I could see the
>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>> But it is still not distributing the work to other nodes. What I did next
>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>> job be distributed to other nodes. However still only one node is being
>>>> used :(((. What is that is going wrong can some one help?
>>>>
>>>> Sample of fairsheduler log:
>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>
>>>> My Data distributed as blocks to other nodes. The host with IP
>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>> request.
>>>>
>>>> Total number of blocks: 8
>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> Someone please advice on how to go about this.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>> servers today and will try the suggestions made on monday and will let you
>>>>> know how it goes.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>> german.fl@samsung.com> wrote:
>>>>>
>>>>>> Ashish
>>>>>>
>>>>>> Could this be related to the scheduler you are using and its
>>>>>> settings?.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On lab environments when running a single type of job I often use
>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>> a good job distributing the load.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You could give that a try (
>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>> accordingly).
>>>>>>
>>>>>>
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>
>>>>>>
>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> ./g
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>
>>>>>> 7966 DataNode
>>>>>> 8480 NodeManager
>>>>>> 8353 ResourceManager
>>>>>> 8141 SecondaryNameNode
>>>>>> 7834 NameNode
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Logs were updated only when I copied the data. After copying the data
>>>>>> there has been no updates on the log files.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>> Chris
>>>>>>
>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>
>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Chris,
>>>>>>
>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>> When I distribute a file I could see that there are replica of data
>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>> some more inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>> Chris
>>>>>>
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main
class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

Next I killed the datanode in 10.12.11.210 and l see the following messages
in the log files. Looks like the namenode is still trying to assign the
complete task to one single node and since it does not find the complete
data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,348 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,871 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,897 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,349 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,874 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,900 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>


--Ashish


On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>
> Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>
>> I also tried the following and it complains of filenotfound exception and
>> some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>> ID
>> User
>> Name
>> Application Type
>> Queue
>> StartTime
>> FinishTime
>> State
>> FinalStatus
>> Progress
>> Tracking UI
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>
>>> Hello Ashish
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>  like
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> German,
>>>>
>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>> my resource manger but the behavior remains same. I could see the
>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>> But it is still not distributing the work to other nodes. What I did next
>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>> job be distributed to other nodes. However still only one node is being
>>>> used :(((. What is that is going wrong can some one help?
>>>>
>>>> Sample of fairsheduler log:
>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>
>>>> My Data distributed as blocks to other nodes. The host with IP
>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>> request.
>>>>
>>>> Total number of blocks: 8
>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> Someone please advice on how to go about this.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>> servers today and will try the suggestions made on monday and will let you
>>>>> know how it goes.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>> german.fl@samsung.com> wrote:
>>>>>
>>>>>> Ashish
>>>>>>
>>>>>> Could this be related to the scheduler you are using and its
>>>>>> settings?.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On lab environments when running a single type of job I often use
>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>> a good job distributing the load.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You could give that a try (
>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>> accordingly).
>>>>>>
>>>>>>
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>
>>>>>>
>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> ./g
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>
>>>>>> 7966 DataNode
>>>>>> 8480 NodeManager
>>>>>> 8353 ResourceManager
>>>>>> 8141 SecondaryNameNode
>>>>>> 7834 NameNode
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Logs were updated only when I copied the data. After copying the data
>>>>>> there has been no updates on the log files.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>> Chris
>>>>>>
>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>
>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Chris,
>>>>>>
>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>> When I distribute a file I could see that there are replica of data
>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>> some more inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>> Chris
>>>>>>
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main
class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

Next I killed the datanode in 10.12.11.210 and l see the following messages
in the log files. Looks like the namenode is still trying to assign the
complete task to one single node and since it does not find the complete
data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,348 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,871 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,897 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,349 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,874 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,900 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>


--Ashish


On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>
> Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>
>> I also tried the following and it complains of filenotfound exception and
>> some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>> ID
>> User
>> Name
>> Application Type
>> Queue
>> StartTime
>> FinishTime
>> State
>> FinalStatus
>> Progress
>> Tracking UI
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>
>>> Hello Ashish
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>  like
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> German,
>>>>
>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>> my resource manger but the behavior remains same. I could see the
>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>> But it is still not distributing the work to other nodes. What I did next
>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>> job be distributed to other nodes. However still only one node is being
>>>> used :(((. What is that is going wrong can some one help?
>>>>
>>>> Sample of fairsheduler log:
>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>
>>>> My Data distributed as blocks to other nodes. The host with IP
>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>> request.
>>>>
>>>> Total number of blocks: 8
>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> Someone please advice on how to go about this.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>> servers today and will try the suggestions made on monday and will let you
>>>>> know how it goes.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>> german.fl@samsung.com> wrote:
>>>>>
>>>>>> Ashish
>>>>>>
>>>>>> Could this be related to the scheduler you are using and its
>>>>>> settings?.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On lab environments when running a single type of job I often use
>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>> a good job distributing the load.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You could give that a try (
>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>> accordingly).
>>>>>>
>>>>>>
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>
>>>>>>
>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> ./g
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>
>>>>>> 7966 DataNode
>>>>>> 8480 NodeManager
>>>>>> 8353 ResourceManager
>>>>>> 8141 SecondaryNameNode
>>>>>> 7834 NameNode
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Logs were updated only when I copied the data. After copying the data
>>>>>> there has been no updates on the log files.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>> Chris
>>>>>>
>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>
>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Chris,
>>>>>>
>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>> When I distribute a file I could see that there are replica of data
>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>> some more inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>> Chris
>>>>>>
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main
class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

Next I killed the datanode in 10.12.11.210 and l see the following messages
in the log files. Looks like the namenode is still trying to assign the
complete task to one single node and since it does not find the complete
data set in one node it is complaining.

2014-01-15 16:38:26,894 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,348 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,871 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:27,897 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,349 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1dev-211:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,874 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-dev06:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>
2014-01-15 16:38:28,900 WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>


--Ashish


On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>
> Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>
>> I also tried the following and it complains of filenotfound exception and
>> some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>> ID
>> User
>> Name
>> Application Type
>> Queue
>> StartTime
>> FinishTime
>> State
>> FinalStatus
>> Progress
>> Tracking UI
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>>
>>> Hello Ashish
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>  like
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> German,
>>>>
>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>> my resource manger but the behavior remains same. I could see the
>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>> But it is still not distributing the work to other nodes. What I did next
>>>> was started 3 jobs simultaneously so that may be some part of one of the
>>>> job be distributed to other nodes. However still only one node is being
>>>> used :(((. What is that is going wrong can some one help?
>>>>
>>>> Sample of fairsheduler log:
>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>
>>>> My Data distributed as blocks to other nodes. The host with IP
>>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>>> request.
>>>>
>>>> Total number of blocks: 8
>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>> 10.12.11.210:50010    View Block Info
>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> Someone please advice on how to go about this.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>
>>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>>> servers today and will try the suggestions made on monday and will let you
>>>>> know how it goes.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>> german.fl@samsung.com> wrote:
>>>>>
>>>>>> Ashish
>>>>>>
>>>>>> Could this be related to the scheduler you are using and its
>>>>>> settings?.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On lab environments when running a single type of job I often use
>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>>> a good job distributing the load.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You could give that a try (
>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>> this theory (note that  how the jobs are scheduled depend on resources such
>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>> accordingly).
>>>>>>
>>>>>>
>>>>>>
>>>>>> <property>
>>>>>>
>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>
>>>>>>
>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> ./g
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>
>>>>>>
>>>>>>
>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>> everything running including a slave datanode. Data was also distributed
>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>
>>>>>> 7966 DataNode
>>>>>> 8480 NodeManager
>>>>>> 8353 ResourceManager
>>>>>> 8141 SecondaryNameNode
>>>>>> 7834 NameNode
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Logs were updated only when I copied the data. After copying the data
>>>>>> there has been no updates on the log files.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>> Chris
>>>>>>
>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Here is the block info for the record I distributed. As can be seen
>>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>
>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.211:50010    View Block Info
>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello Chris,
>>>>>>
>>>>>> I have now a cluster with 3 nodes and replication factor being 2.
>>>>>> When I distribute a file I could see that there are replica of data
>>>>>> available in other nodes. However when I run a map reduce job again only
>>>>>> one node is serving all the request :(. Can you or anyone please provide
>>>>>> some more inputs.
>>>>>>
>>>>>> Thanks
>>>>>> Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>> Chris
>>>>>>
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/


Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

One more thing try , just stop datanode process in  10.12.11.210 and run
the job




On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>
> I also tried the following and it complains of filenotfound exception and
> some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
> ID
> User
> Name
> Application Type
> Queue
> StartTime
> FinishTime
> State
> FinalStatus
> Progress
> Tracking UI
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
> Please advice what should I do
>
> --Ashish
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>  like
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>
>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>> servers today and will try the suggestions made on monday and will let you
>>>> know how it goes.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>> german.fl@samsung.com> wrote:
>>>>
>>>>> Ashish
>>>>>
>>>>> Could this be related to the scheduler you are using and its settings?.
>>>>>
>>>>>
>>>>>
>>>>> On lab environments when running a single type of job I often use
>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>> a good job distributing the load.
>>>>>
>>>>>
>>>>>
>>>>> You could give that a try (
>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>>
>>>>>
>>>>>
>>>>> <property>
>>>>>
>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>
>>>>>
>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> ./g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>
>>>>>
>>>>>
>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>> everything running including a slave datanode. Data was also distributed
>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>
>>>>> 7966 DataNode
>>>>> 8480 NodeManager
>>>>> 8353 ResourceManager
>>>>> 8141 SecondaryNameNode
>>>>> 7834 NameNode
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Logs were updated only when I copied the data. After copying the data
>>>>> there has been no updates on the log files.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Do the logs on the three nodes contain anything interesting?
>>>>> Chris
>>>>>
>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Here is the block info for the record I distributed. As can be seen
>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>
>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>> Chris
>>>>>
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/


Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

One more thing try , just stop datanode process in  10.12.11.210 and run
the job




On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>
> I also tried the following and it complains of filenotfound exception and
> some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
> ID
> User
> Name
> Application Type
> Queue
> StartTime
> FinishTime
> State
> FinalStatus
> Progress
> Tracking UI
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
> Please advice what should I do
>
> --Ashish
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>  like
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>
>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>> servers today and will try the suggestions made on monday and will let you
>>>> know how it goes.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>> german.fl@samsung.com> wrote:
>>>>
>>>>> Ashish
>>>>>
>>>>> Could this be related to the scheduler you are using and its settings?.
>>>>>
>>>>>
>>>>>
>>>>> On lab environments when running a single type of job I often use
>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>> a good job distributing the load.
>>>>>
>>>>>
>>>>>
>>>>> You could give that a try (
>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>>
>>>>>
>>>>>
>>>>> <property>
>>>>>
>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>
>>>>>
>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> ./g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>
>>>>>
>>>>>
>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>> everything running including a slave datanode. Data was also distributed
>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>
>>>>> 7966 DataNode
>>>>> 8480 NodeManager
>>>>> 8353 ResourceManager
>>>>> 8141 SecondaryNameNode
>>>>> 7834 NameNode
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Logs were updated only when I copied the data. After copying the data
>>>>> there has been no updates on the log files.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Do the logs on the three nodes contain anything interesting?
>>>>> Chris
>>>>>
>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Here is the block info for the record I distributed. As can be seen
>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>
>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>> Chris
>>>>>
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/


Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

One more thing try , just stop datanode process in  10.12.11.210 and run
the job




On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>
> I also tried the following and it complains of filenotfound exception and
> some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
> ID
> User
> Name
> Application Type
> Queue
> StartTime
> FinishTime
> State
> FinalStatus
> Progress
> Tracking UI
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
> Please advice what should I do
>
> --Ashish
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>  like
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>
>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>> servers today and will try the suggestions made on monday and will let you
>>>> know how it goes.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>> german.fl@samsung.com> wrote:
>>>>
>>>>> Ashish
>>>>>
>>>>> Could this be related to the scheduler you are using and its settings?.
>>>>>
>>>>>
>>>>>
>>>>> On lab environments when running a single type of job I often use
>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>> a good job distributing the load.
>>>>>
>>>>>
>>>>>
>>>>> You could give that a try (
>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>>
>>>>>
>>>>>
>>>>> <property>
>>>>>
>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>
>>>>>
>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> ./g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>
>>>>>
>>>>>
>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>> everything running including a slave datanode. Data was also distributed
>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>
>>>>> 7966 DataNode
>>>>> 8480 NodeManager
>>>>> 8353 ResourceManager
>>>>> 8141 SecondaryNameNode
>>>>> 7834 NameNode
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Logs were updated only when I copied the data. After copying the data
>>>>> there has been no updates on the log files.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Do the logs on the three nodes contain anything interesting?
>>>>> Chris
>>>>>
>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Here is the block info for the record I distributed. As can be seen
>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>
>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>> Chris
>>>>>
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish

2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/


Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

One more thing try , just stop datanode process in  10.12.11.210 and run
the job




On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>
> I also tried the following and it complains of filenotfound exception and
> some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
> ID
> User
> Name
> Application Type
> Queue
> StartTime
> FinishTime
> State
> FinalStatus
> Progress
> Tracking UI
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
> Please advice what should I do
>
> --Ashish
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:
>
>> Hello Ashish
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>  like
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com>wrote:
>>>
>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>> servers today and will try the suggestions made on monday and will let you
>>>> know how it goes.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>> german.fl@samsung.com> wrote:
>>>>
>>>>> Ashish
>>>>>
>>>>> Could this be related to the scheduler you are using and its settings?.
>>>>>
>>>>>
>>>>>
>>>>> On lab environments when running a single type of job I often use
>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>>> a good job distributing the load.
>>>>>
>>>>>
>>>>>
>>>>> You could give that a try (
>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>>
>>>>>
>>>>>
>>>>> <property>
>>>>>
>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>
>>>>>
>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> ./g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>
>>>>>
>>>>>
>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>> everything running including a slave datanode. Data was also distributed
>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>
>>>>> 7966 DataNode
>>>>> 8480 NodeManager
>>>>> 8353 ResourceManager
>>>>> 8141 SecondaryNameNode
>>>>> 7834 NameNode
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Logs were updated only when I copied the data. After copying the data
>>>>> there has been no updates on the log files.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Do the logs on the three nodes contain anything interesting?
>>>>> Chris
>>>>>
>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Here is the block info for the record I distributed. As can be seen
>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>
>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>> Chris
>>>>>
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Sudhakara,

Thanks for your suggestion. However once I change the mapreduce framework
to yarn my map reduce jobs does not get executed at all. It seems it is
waiting on some thread indefinitely. Here is what I have done

1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

The jobs are just stuck and do not move further.


I also tried the following and it complains of filenotfound exception and
some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
file:///opt/ApacheHadoop/out/

Below is the status of the job from hadoop application console. The
progress bar does not move at all.

ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
rootwordcountMAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
UNDEFINED

UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
Please advice what should I do

--Ashish


On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
> like
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>> Ashish
>>>>
>>>> Could this be related to the scheduler you are using and its settings?.
>>>>
>>>>
>>>>
>>>> On lab environments when running a single type of job I often use
>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>> a good job distributing the load.
>>>>
>>>>
>>>>
>>>> You could give that a try (
>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> )
>>>>
>>>>
>>>>
>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> ./g
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>
>>>>
>>>>
>>>> Another point to add here 10.12.11.210 is the host which has everything
>>>> running including a slave datanode. Data was also distributed this host as
>>>> well as the jar file. Following are running on 10.12.11.210
>>>>
>>>> 7966 DataNode
>>>> 8480 NodeManager
>>>> 8353 ResourceManager
>>>> 8141 SecondaryNameNode
>>>> 7834 NameNode
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Logs were updated only when I copied the data. After copying the data
>>>> there has been no updates on the log files.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> Do the logs on the three nodes contain anything interesting?
>>>> Chris
>>>>
>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> --Ashish
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>> Chris
>>>>
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Sudhakara,

Thanks for your suggestion. However once I change the mapreduce framework
to yarn my map reduce jobs does not get executed at all. It seems it is
waiting on some thread indefinitely. Here is what I have done

1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

The jobs are just stuck and do not move further.


I also tried the following and it complains of filenotfound exception and
some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
file:///opt/ApacheHadoop/out/

Below is the status of the job from hadoop application console. The
progress bar does not move at all.

ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
rootwordcountMAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
UNDEFINED

UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
Please advice what should I do

--Ashish


On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
> like
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>> Ashish
>>>>
>>>> Could this be related to the scheduler you are using and its settings?.
>>>>
>>>>
>>>>
>>>> On lab environments when running a single type of job I often use
>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>> a good job distributing the load.
>>>>
>>>>
>>>>
>>>> You could give that a try (
>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> )
>>>>
>>>>
>>>>
>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> ./g
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>
>>>>
>>>>
>>>> Another point to add here 10.12.11.210 is the host which has everything
>>>> running including a slave datanode. Data was also distributed this host as
>>>> well as the jar file. Following are running on 10.12.11.210
>>>>
>>>> 7966 DataNode
>>>> 8480 NodeManager
>>>> 8353 ResourceManager
>>>> 8141 SecondaryNameNode
>>>> 7834 NameNode
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Logs were updated only when I copied the data. After copying the data
>>>> there has been no updates on the log files.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> Do the logs on the three nodes contain anything interesting?
>>>> Chris
>>>>
>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> --Ashish
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>> Chris
>>>>
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Sudhakara,

Thanks for your suggestion. However once I change the mapreduce framework
to yarn my map reduce jobs does not get executed at all. It seems it is
waiting on some thread indefinitely. Here is what I have done

1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

The jobs are just stuck and do not move further.


I also tried the following and it complains of filenotfound exception and
some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
file:///opt/ApacheHadoop/out/

Below is the status of the job from hadoop application console. The
progress bar does not move at all.

ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
rootwordcountMAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
UNDEFINED

UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
Please advice what should I do

--Ashish


On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
> like
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>> Ashish
>>>>
>>>> Could this be related to the scheduler you are using and its settings?.
>>>>
>>>>
>>>>
>>>> On lab environments when running a single type of job I often use
>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>> a good job distributing the load.
>>>>
>>>>
>>>>
>>>> You could give that a try (
>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> )
>>>>
>>>>
>>>>
>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> ./g
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>
>>>>
>>>>
>>>> Another point to add here 10.12.11.210 is the host which has everything
>>>> running including a slave datanode. Data was also distributed this host as
>>>> well as the jar file. Following are running on 10.12.11.210
>>>>
>>>> 7966 DataNode
>>>> 8480 NodeManager
>>>> 8353 ResourceManager
>>>> 8141 SecondaryNameNode
>>>> 7834 NameNode
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Logs were updated only when I copied the data. After copying the data
>>>> there has been no updates on the log files.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> Do the logs on the three nodes contain anything interesting?
>>>> Chris
>>>>
>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> --Ashish
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>> Chris
>>>>
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Sudhakara,

Thanks for your suggestion. However once I change the mapreduce framework
to yarn my map reduce jobs does not get executed at all. It seems it is
waiting on some thread indefinitely. Here is what I have done

1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

The jobs are just stuck and do not move further.


I also tried the following and it complains of filenotfound exception and
some security exception

./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
file:///opt/ApacheHadoop/out/

Below is the status of the job from hadoop application console. The
progress bar does not move at all.

ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
rootwordcountMAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
UNDEFINED

UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
Please advice what should I do

--Ashish


On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <su...@gmail.com>wrote:

> Hello Ashish
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
> like
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>> Ashish
>>>>
>>>> Could this be related to the scheduler you are using and its settings?.
>>>>
>>>>
>>>>
>>>> On lab environments when running a single type of job I often use
>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>>> a good job distributing the load.
>>>>
>>>>
>>>>
>>>> You could give that a try (
>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>> )
>>>>
>>>>
>>>>
>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>> theory (note that  how the jobs are scheduled depend on resources such as
>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> ./g
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>
>>>>
>>>>
>>>> Another point to add here 10.12.11.210 is the host which has everything
>>>> running including a slave datanode. Data was also distributed this host as
>>>> well as the jar file. Following are running on 10.12.11.210
>>>>
>>>> 7966 DataNode
>>>> 8480 NodeManager
>>>> 8353 ResourceManager
>>>> 8141 SecondaryNameNode
>>>> 7834 NameNode
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Logs were updated only when I copied the data. After copying the data
>>>> there has been no updates on the log files.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> Do the logs on the three nodes contain anything interesting?
>>>> Chris
>>>>
>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>> --Ashish
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>> Chris
>>>>
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>
> Regards,
> ...Sudhakara.st
>
>

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the
Local file system. Can you try by give the full URI path of the input and
output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
file:///home/input/  file:///home/output/


On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:

> German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>> Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the
Local file system. Can you try by give the full URI path of the input and
output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
file:///home/input/  file:///home/output/


On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:

> German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>> Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the
Local file system. Can you try by give the full URI path of the input and
output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
file:///home/input/  file:///home/output/


On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:

> German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>> Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by sudhakara st <su...@gmail.com>.

Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the
Local file system. Can you try by give the full URI path of the input and
output path.
like
$hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
file:///home/input/  file:///home/output/


On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <as...@gmail.com> wrote:

> German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>> Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>


-- 

Regards,
...Sudhakara.st

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

German,

This does not seem to be helping. I tried to use the Fairscheduler as my
resource manger but the behavior remains same. I could see the
fairscheduler log getting continuous heart beat from both the other nodes.
But it is still not distributing the work to other nodes. What I did next
was started 3 jobs simultaneously so that may be some part of one of the
job be distributed to other nodes. However still only one node is being
used :(((. What is that is going wrong can some one help?

Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05

My Data distributed as blocks to other nodes. The host with IP 10.12.11.210
has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741867:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741868:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741869:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741870:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741871:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741872:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741873:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

Someone please advice on how to go about this.

--Ashish


On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:

> Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>> Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

German,

This does not seem to be helping. I tried to use the Fairscheduler as my
resource manger but the behavior remains same. I could see the
fairscheduler log getting continuous heart beat from both the other nodes.
But it is still not distributing the work to other nodes. What I did next
was started 3 jobs simultaneously so that may be some part of one of the
job be distributed to other nodes. However still only one node is being
used :(((. What is that is going wrong can some one help?

Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05

My Data distributed as blocks to other nodes. The host with IP 10.12.11.210
has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741867:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741868:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741869:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741870:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741871:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741872:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741873:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

Someone please advice on how to go about this.

--Ashish


On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:

> Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>> Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

German,

This does not seem to be helping. I tried to use the Fairscheduler as my
resource manger but the behavior remains same. I could see the
fairscheduler log getting continuous heart beat from both the other nodes.
But it is still not distributing the work to other nodes. What I did next
was started 3 jobs simultaneously so that may be some part of one of the
job be distributed to other nodes. However still only one node is being
used :(((. What is that is going wrong can some one help?

Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05

My Data distributed as blocks to other nodes. The host with IP 10.12.11.210
has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741867:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741868:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741869:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741870:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741871:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741872:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741873:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

Someone please advice on how to go about this.

--Ashish


On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:

> Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>> Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

German,

This does not seem to be helping. I tried to use the Fairscheduler as my
resource manger but the behavior remains same. I could see the
fairscheduler log getting continuous heart beat from both the other nodes.
But it is still not distributing the work to other nodes. What I did next
was started 3 jobs simultaneously so that may be some part of one of the
job be distributed to other nodes. However still only one node is being
used :(((. What is that is going wrong can some one help?

Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05

My Data distributed as blocks to other nodes. The host with IP 10.12.11.210
has all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741867:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741868:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741869:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741870:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741871:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741872:         10.12.11.211:50010    View Block Info
10.12.11.210:50010    View Block Info
1073741873:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

Someone please advice on how to go about this.

--Ashish


On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <as...@gmail.com> wrote:

> Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> german.fl@samsung.com> wrote:
>
>> Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Thanks for all these suggestions. Somehow I do not have access to the
servers today and will try the suggestions made on monday and will let you
know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
german.fl@samsung.com> wrote:

> Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Thanks for all these suggestions. Somehow I do not have access to the
servers today and will try the suggestions made on monday and will let you
know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
german.fl@samsung.com> wrote:

> Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Thanks for all these suggestions. Somehow I do not have access to the
servers today and will try the suggestions made on monday and will let you
know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
german.fl@samsung.com> wrote:

> Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Thanks for all these suggestions. Somehow I do not have access to the
servers today and will try the suggestions made on monday and will let you
know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
german.fl@samsung.com> wrote:

> Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>

RE: Distributing the code to multiple nodes

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Ashish

Could this be related to the scheduler you are using and its settings?.

 

On lab environments when running a single type of job I often use
FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a
good job distributing the load.

 

You could give that a try
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairSch
eduler.html)

 

I think just changing yarn-site.xml  as follows could demonstrate this
theory (note that  how the jobs are scheduled depend on resources such as
memory on the nodes and you would need to setup yarn-site.xml accordingly). 

 

<property>

  <name>yarn.resourcemanager.scheduler.class</name>

 
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche
duler</value>

</property>

 

Regards

./g

 

 

From: Ashish Jain [mailto:ashjain2@gmail.com] 
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

 

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

 

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.

 

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

							
							
							
							
							
							
							
							

--Ashish

 

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish

 

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication factor
are needed to force distribution of the processing. 
Chris

On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server CPU
usage remains constant and hence it means other node is not being used. Can
someone help me to debug this issue?

++Ashish

 

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

Hello All,

I have a 2 node hadoop cluster running with a replication factor of 2. I
have a file of size around 1 GB which when copied to HDFS is replicated to
both the nodes. Seeing the block info I can see the file has been subdivided
into 8 parts which means it has been subdivided into 8 blocks each of size
128 MB.  I use this file as input to run the word count program. Some how I
feel only one node is doing all the work and the code is not distributed to
other node. How can I make sure code is distributed to both the nodes? Also
is there a log or GUI which can be used for this?

Please note I am using the latest stable release that is 2.2.0.

++Ashish

RE: Distributing the code to multiple nodes

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Ashish

Could this be related to the scheduler you are using and its settings?.

 

On lab environments when running a single type of job I often use
FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a
good job distributing the load.

 

You could give that a try
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairSch
eduler.html)

 

I think just changing yarn-site.xml  as follows could demonstrate this
theory (note that  how the jobs are scheduled depend on resources such as
memory on the nodes and you would need to setup yarn-site.xml accordingly). 

 

<property>

  <name>yarn.resourcemanager.scheduler.class</name>

 
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche
duler</value>

</property>

 

Regards

./g

 

 

From: Ashish Jain [mailto:ashjain2@gmail.com] 
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

 

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

 

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.

 

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

							
							
							
							
							
							
							
							

--Ashish

 

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish

 

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication factor
are needed to force distribution of the processing. 
Chris

On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server CPU
usage remains constant and hence it means other node is not being used. Can
someone help me to debug this issue?

++Ashish

 

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

Hello All,

I have a 2 node hadoop cluster running with a replication factor of 2. I
have a file of size around 1 GB which when copied to HDFS is replicated to
both the nodes. Seeing the block info I can see the file has been subdivided
into 8 parts which means it has been subdivided into 8 blocks each of size
128 MB.  I use this file as input to run the word count program. Some how I
feel only one node is doing all the work and the code is not distributed to
other node. How can I make sure code is distributed to both the nodes? Also
is there a log or GUI which can be used for this?

Please note I am using the latest stable release that is 2.2.0.

++Ashish

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

...And do all three nodes appear in the NameNode and YARN web user
interfaces?
Chris
On Jan 9, 2014 7:46 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>>  Chris
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>>
>>>>>>>> ++Ashish
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

...And do all three nodes appear in the NameNode and YARN web user
interfaces?
Chris
On Jan 9, 2014 7:46 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>>  Chris
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>>
>>>>>>>> ++Ashish
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

...And do all three nodes appear in the NameNode and YARN web user
interfaces?
Chris
On Jan 9, 2014 7:46 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Here is the block info for the record I distributed. As can be seen
>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>> all the request. Replicas are available with 209 as well as 210
>>>>
>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.211:50010    View Block Info
>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>> 10.12.11.209:50010    View Block Info
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available in
>>>>> other nodes. However when I run a map reduce job again only one node is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>>
>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>> block present on each node. This would allow the possibility that a single
>>>>>> node would do the work and yet be data local.  It will probably happen if
>>>>>> that single node has the needed capacity.  More nodes than the replication
>>>>>> factor are needed to force distribution of the processing.
>>>>>>  Chris
>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>>> again and could see that CPU usage only for one server going high other
>>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>>
>>>>>>>> ++Ashish
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

RE: Distributing the code to multiple nodes

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Ashish

Could this be related to the scheduler you are using and its settings?.

 

On lab environments when running a single type of job I often use
FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a
good job distributing the load.

 

You could give that a try
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairSch
eduler.html)

 

I think just changing yarn-site.xml  as follows could demonstrate this
theory (note that  how the jobs are scheduled depend on resources such as
memory on the nodes and you would need to setup yarn-site.xml accordingly). 

 

<property>

  <name>yarn.resourcemanager.scheduler.class</name>

 
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche
duler</value>

</property>

 

Regards

./g

 

 

From: Ashish Jain [mailto:ashjain2@gmail.com] 
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

 

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

 

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.

 

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

							
							
							
							
							
							
							
							

--Ashish

 

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish

 

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication factor
are needed to force distribution of the processing. 
Chris

On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server CPU
usage remains constant and hence it means other node is not being used. Can
someone help me to debug this issue?

++Ashish

 

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

Hello All,

I have a 2 node hadoop cluster running with a replication factor of 2. I
have a file of size around 1 GB which when copied to HDFS is replicated to
both the nodes. Seeing the block info I can see the file has been subdivided
into 8 parts which means it has been subdivided into 8 blocks each of size
128 MB.  I use this file as input to run the word count program. Some how I
feel only one node is doing all the work and the code is not distributed to
other node. How can I make sure code is distributed to both the nodes? Also
is there a log or GUI which can be used for this?

Please note I am using the latest stable release that is 2.2.0.

++Ashish

RE: Distributing the code to multiple nodes

Posted by German Florez-Larrahondo <ge...@samsung.com>.

Ashish

Could this be related to the scheduler you are using and its settings?.

 

On lab environments when running a single type of job I often use
FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does a
good job distributing the load.

 

You could give that a try
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairSch
eduler.html)

 

I think just changing yarn-site.xml  as follows could demonstrate this
theory (note that  how the jobs are scheduled depend on resources such as
memory on the nodes and you would need to setup yarn-site.xml accordingly). 

 

<property>

  <name>yarn.resourcemanager.scheduler.class</name>

 
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche
duler</value>

</property>

 

Regards

./g

 

 

From: Ashish Jain [mailto:ashjain2@gmail.com] 
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.org
Subject: Re: Distributing the code to multiple nodes

 

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

 

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.

 

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info

							
							
							
							
							
							
							
							

--Ashish

 

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish

 

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication factor
are needed to force distribution of the processing. 
Chris

On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server CPU
usage remains constant and hence it means other node is not being used. Can
someone help me to debug this issue?

++Ashish

 

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

Hello All,

I have a 2 node hadoop cluster running with a replication factor of 2. I
have a file of size around 1 GB which when copied to HDFS is replicated to
both the nodes. Seeing the block info I can see the file has been subdivided
into 8 parts which means it has been subdivided into 8 blocks each of size
128 MB.  I use this file as input to run the word count program. Some how I
feel only one node is doing all the work and the code is not distributed to
other node. How can I make sure code is distributed to both the nodes? Also
is there a log or GUI which can be used for this?

Please note I am using the latest stable release that is 2.2.0.

++Ashish

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode



On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>>  Chris
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode



On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>>  Chris
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode



On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>>  Chris
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Another point to add here 10.12.11.210 is the host which has everything
running including a slave datanode. Data was also distributed this host as
well as the jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode



On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <as...@gmail.com> wrote:

> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --Ashish
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello Chris,
>>>>
>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>> I distribute a file I could see that there are replica of data available in
>>>> other nodes. However when I run a map reduce job again only one node is
>>>> serving all the request :(. Can you or anyone please provide some more
>>>> inputs.
>>>>
>>>> Thanks
>>>> Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single node
>>>>> would do the work and yet be data local.  It will probably happen if that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>>  Chris
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>>
>>>>>> Guys,
>>>>>>
>>>>>> I am sure that only one node is being used. I just know ran the job
>>>>>> again and could see that CPU usage only for one server going high other
>>>>>> server CPU usage remains constant and hence it means other node is not
>>>>>> being used. Can someone help me to debug this issue?
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.


On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

> Do the logs on the three nodes contain anything interesting?
> Chris
>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>>  Chris
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.


On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

> Do the logs on the three nodes contain anything interesting?
> Chris
>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>>  Chris
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.


On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

> Do the logs on the three nodes contain anything interesting?
> Chris
>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>>  Chris
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Logs were updated only when I copied the data. After copying the data there
has been no updates on the log files.


On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <ch...@gmail.com> wrote:

> Do the logs on the three nodes contain anything interesting?
> Chris
>  On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --Ashish
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>>
>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>> present on each node. This would allow the possibility that a single node
>>>> would do the work and yet be data local.  It will probably happen if that
>>>> single node has the needed capacity.  More nodes than the replication
>>>> factor are needed to force distribution of the processing.
>>>>  Chris
>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com>wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I have a 2 node hadoop cluster running with a replication factor of
>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS is
>>>>>> replicated to both the nodes. Seeing the block info I can see the file has
>>>>>> been subdivided into 8 parts which means it has been subdivided into 8
>>>>>> blocks each of size 128 MB.  I use this file as input to run the word count
>>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>
>>>>>> ++Ashish
>>>>>>
>>>>>
>>>>>
>>>
>>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

Do the logs on the three nodes contain anything interesting?
Chris
 On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>>  Chris
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

Do the logs on the three nodes contain anything interesting?
Chris
 On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>>  Chris
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

Do the logs on the three nodes contain anything interesting?
Chris
 On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>>  Chris
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

Do the logs on the three nodes contain anything interesting?
Chris
 On Jan 9, 2014 3:47 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --Ashish
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>>  Chris
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>>
>>>> Guys,
>>>>
>>>> I am sure that only one node is being used. I just know ran the job
>>>> again and could see that CPU usage only for one server going high other
>>>> server CPU usage remains constant and hence it means other node is not
>>>> being used. Can someone help me to debug this issue?
>>>>
>>>> ++Ashish
>>>>
>>>>
>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>
>>>>
>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
























































--Ashish


On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>>  Chris
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
























































--Ashish


On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>>  Chris
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
























































--Ashish


On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>>  Chris
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Here is the block info for the record I distributed. As can be seen only
10.12.11.210 has all the data and this is the node which is serving all the
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741858:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741859:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741860:         10.12.11.210:50010    View Block Info
10.12.11.211:50010    View Block Info
1073741861:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741862:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741863:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
1073741864:         10.12.11.210:50010    View Block Info
10.12.11.209:50010    View Block Info
























































--Ashish


On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com>wrote:
>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>>  Chris
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>> to both the nodes. Seeing the block info I can see the file has been
>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>> each of size 128 MB.  I use this file as input to run the word count
>>>> program. Some how I feel only one node is doing all the work and the code
>>>> is not distributed to other node. How can I make sure code is distributed
>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>
>>>> ++Ashish
>>>>
>>>
>>>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish


On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish


On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish


On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>
>>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Hello Chris,

I have now a cluster with 3 nodes and replication factor being 2. When I
distribute a file I could see that there are replica of data available in
other nodes. However when I run a map reduce job again only one node is
serving all the request :(. Can you or anyone please provide some more
inputs.

Thanks
Ashish


On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <ch...@gmail.com> wrote:

> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:
>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>
>>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication
factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication
factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication
factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Chris Mawata <ch...@gmail.com>.

2 nodes and replication factor of 2 results in a replica of each block
present on each node. This would allow the possibility that a single node
would do the work and yet be data local.  It will probably happen if that
single node has the needed capacity.  More nodes than the replication
factor are needed to force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" <as...@gmail.com> wrote:

> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:
>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server
CPU usage remains constant and hence it means other node is not being used.
Can someone help me to debug this issue?

++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server
CPU usage remains constant and hence it means other node is not being used.
Can someone help me to debug this issue?

++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server
CPU usage remains constant and hence it means other node is not being used.
Can someone help me to debug this issue?

++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>

Re: Distributing the code to multiple nodes

Posted by Ashish Jain <as...@gmail.com>.

Guys,

I am sure that only one node is being used. I just know ran the job again
and could see that CPU usage only for one server going high other server
CPU usage remains constant and hence it means other node is not being used.
Can someone help me to debug this issue?

++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <as...@gmail.com> wrote:

> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>