You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hama.apache.org by Behroz Sikander <be...@gmail.com> on 2015/06/22 19:33:34 UTC

Groomserer BSPPeerChild limit

Hi,
Recently, I moved from a single machine setup to a 2 machine setup. I was
successfully able to run my job that uses the HDFS to get data. I have 3
trivial questions

1- To access HDFS, I have to manually give the IP address of server running
HDFS. I thought that Hama will automatically pick from the configurations
but it does not. I am probably doing something wrong. Right now my code
work by using the following.

FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"), conf);

2- On my master server, when I start hama it automatically starts hama in
the slave machine (all good). Both master and slave are set as
groomservers. This means that I have 2 servers to run my job which means
that I can open more BSPPeerChild processes. And if I submit my jar with 3
bsp tasks then everything works fine. But when I move to 4 tasks, Hama
freezes. Here is the result of JPS command on slave.


Result of JPS command on Master


​

You can see that it is only opening tasks on slaves but not on master.

Note: I tried to change the bsp.tasks.maximum property in hama-default.xml
to 4 but still same result.

3- I want my cluster to open as many BSPPeerChild processes as possible. Is
there any setting that can I do to achieve that ? Or hama picks up the
values from hama-default.xml to open tasks ?


Regards,

Behroz Sikander

RE: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@samsung.com>.
Hi,

If you need to kill the single JVM manually, then your program has an infinite 
loop and it's a local BSP job.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Monday, August 03, 2015 7:39 PM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

I tried bin/stop-bspd.sh but the output of script says that no
groom/bspmaster process. Then I have to kill them manually. I am working on
Hama 0.7.0

On Mon, Aug 3, 2015 at 1:07 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Hi,
>
> Congratz! You can shutdown the cluster with following command: $
> bin/stop-bspd.sh
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Sunday, August 02, 2015 11:27 PM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> Hi,
> Last day, I got the fix for /etc/hosts file and now I can modify it. I
> tried to run  the cluster with 3 machines and everything went super fine.
>
> Thanks :)
>
> btw if I run a process using the following. How can I stop it ? Right now I
> am using kill -9 <process_id>
> % ./bin/hama bspmaster
>
> On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <be...@gmail.com>
> wrote:
>
> > Ok perfect. I do not have rights on /etc/hosts so that's why I was using
> > the IP addresses. I will talk to the administrator.
> >
> > Btw I am wondering, how PI example was able to communicate with the other
> > servers. PI examples runs fine even if I have tasks more than 3 (works on
> > both machines).
> >
> > On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <ed...@apache.org>
> > wrote:
> >
> >> OKay almost done. I guess you need to add host names to your
> >> /etc/hosts file. :-) Please see also
> >>
> >>
> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
> >>
> >> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> > Server 2 was showing the exception that I posted in the previous
> email.
> >> > Server1 is showing the following exception
> >> >
> >> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is
> >> added.
> >> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> >> > groomd_8d4b512cf448_50000
> >> > java.net.UnknownHostException: unknown host: 8d4b512cf448
> >> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> >> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> >> > at org.apache.hama.ipc.Client.call(Client.java:888)
> >> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> >> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
> >> >
> >> > I am looking into this issue.
> >> >
> >> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> >
> >> >> Ok great. I was able to run the zk, groom and bspmaster on server 1.
> >> But
> >> >> when I ran the groom on server2 I got the following exception
> >> >>
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
> >> >> establishing communication link with BSPMaster
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
> >> >> reinitializing GroomServer: java.io.IOException: There is a problem
> in
> >> >> establishing communication link with BSPMaster.
> >> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
> >> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
> >> >> at java.lang.Thread.run(Thread.java:745)
> >> >>
> >> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >> wrote:
> >> >>
> >> >>> Here's my configurations:
> >> >>>
> >> >>> hama-site.xml:
> >> >>>
> >> >>>   <property>
> >> >>>     <name>bsp.master.address</name>
> >> >>>     <value>cluster-0:40000</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>fs.default.name</name>
> >> >>>     <value>hdfs://cluster-0:9000/</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>hama.zookeeper.quorum</name>
> >> >>>     <value>cluster-0</value>
> >> >>>   </property>
> >> >>>
> >> >>>
> >> >>> % bin/hama zookeeper
> >> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
> >> >>> configuration, only one server specified (ignoring)
> >> >>>
> >> >>> Then, open new terminal and run master with following command:
> >> >>>
> >> >>> % bin/hama bspmaster
> >> >>> ...
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK
> >> false
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync
> Client
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
> >> edwardyoon@apache.org>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > If you run zk server too, BSPmaster will be connected to zk and
> >> won't
> >> >>> > throw exceptions.
> >> >>> >
> >> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
> >> behroz89@gmail.com>
> >> >>> wrote:
> >> >>> >> Hi,
> >> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
> >> the
> >> >>> same
> >> >>> >> problem.
> >> >>> >> When I run % bin/hama bspmaster, I am getting the following
> >> exception
> >> >>> >>
> >> >>> >> INFO http.HttpServer: Port returned by
> >> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
> >> >>> Opening
> >> >>> >> the listener on 40013
> >> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> >> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
> >> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
> >> >>> >>  INFO mortbay.log: jetty-6.1.14
> >> >>> >>  INFO mortbay.log: Extract
> >> >>> >>
> >> >>>
> >>
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> >> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
> >> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
> >> :40013
> >> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
> >> >>> >>  INFO bsp.BSPMaster: hdfs://
> >> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
> >> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >> >>> >> at
> >> >>> >>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> >> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >>
> >> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am
> using
> >> >>> just
> >> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
> >> >>> >> <property>
> >> >>> >>                  <name>hama.zookeeper.quorum</name>
> >> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
> >> >>> >>                  <description>Comma separated list of servers in
> >> the
> >> >>> >> ZooKeeper quorum.
> >> >>> >>                  For example, "host1.mydomain.com,
> >> host2.mydomain.com,
> >> >>> >> host3.mydomain.com".
> >> >>> >>                  By default this is set to localhost for local
> and
> >> >>> >> pseudo-distributed modes
> >> >>> >>                  of operation. For a fully-distributed setup,
> this
> >> >>> should
> >> >>> >> be set to a full
> >> >>> >>                  list of ZooKeeper quorum servers. If
> >> HAMA_MANAGES_ZK
> >> >>> is
> >> >>> >> set in hama-env.sh
> >> >>> >>                  this is the list of servers which we will
> >> start/stop
> >> >>> >> ZooKeeper on.
> >> >>> >>                  </description>
> >> >>> >>         </property>
> >> >>> >>        ......
> >> >>> >>        <property>
> >> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
> >> >>> >>                  <value>2181</value>
> >> >>> >>          </property>
> >> >>> >>
> >> >>> >> Is something wrong with my settings ?
> >> >>> >>
> >> >>> >> Regards,
> >> >>> >> Behroz Sikander
> >> >>> >>
> >> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> >>> configurations
> >> >>> >>>
> >> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
> >> Yarn
> >> >>> >>> configuration is only needed when you want to submit a BSP job
> to
> >> Yarn
> >> >>> >>> cluster
> >> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
> >> >>> >>>
> >> >>> >>> > distributed mode ? and is there any way to manage the server
> ? I
> >> >>> mean
> >> >>> >>> right
> >> >>> >>> > now, I have 3 machines with alot of configurations files and
> log
> >> >>> files.
> >> >>> >>> It
> >> >>> >>>
> >> >>> >>> You can use web UI at
> >> http://masterserver_address:40013/bspmaster.jsp
> >> >>> >>>
> >> >>> >>> To debug your program, please try like below:
> >> >>> >>>
> >> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
> >> >>> >>> % bin/hama bspmaster
> >> >>> >>> % bin/hama zookeeper
> >> >>> >>>
> >> >>> >>> 2) Run a Groom at server1 and server2.
> >> >>> >>>
> >> >>> >>> % bin/hama groom
> >> >>> >>>
> >> >>> >>> 3) Check whether deamons are running well. Then, run your
> program
> >> >>> using jar
> >> >>> >>> command at server1.
> >> >>> >>>
> >> >>> >>> % bin/hama jar .....
> >> >>> >>>
> >> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
> >> >>> exception. But
> >> >>> >>> > this occurs in both cases when I run my job with 3 tasks or
> >> with 4
> >> >>> tasks
> >> >>> >>>
> >> >>> >>> In fact, you should not see above initZK error log.
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> Best Regards, Edward J. Yoon
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> -----Original Message-----
> >> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
> >> >>> >>> To: user@hama.apache.org
> >> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>>
> >> >>> >>> I will try the things that you mentioned. I am not using the
> >> latest
> >> >>> version
> >> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> configurations
> >> >>> >>> which makes it more harder for me to understand when things go
> >> wrong.
> >> >>> Any
> >> >>> >>> suggestions ?
> >> >>> >>>
> >> >>> >>> Further, are there any tools that you use for debugging while in
> >> >>> >>> distributed mode ? and is there any way to manage the server ? I
> >> mean
> >> >>> right
> >> >>> >>> now, I have 3 machines with alot of configurations files and log
> >> >>> files. It
> >> >>> >>> takes alot of time. This makes me wonder how people who have
> 100s
> >> of
> >> >>> >>> machines debug and manage the cluster.
> >> >>> >>>
> >> >>> >>> Regards,
> >> >>> >>> Behroz
> >> >>> >>>
> >> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > Hi,
> >> >>> >>> >
> >> >>> >>> > It looks like a zookeeper connection problem. Please check
> >> whether
> >> >>> >>> > zookeeper
> >> >>> >>> > is running and every tasks can connect to zookeeper.
> >> >>> >>> >
> >> >>> >>> > I would recommend you to stop the firewall during debugging,
> and
> >> >>> please
> >> >>> >>> use
> >> >>> >>> > the 0.7.0 latest release.
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > --
> >> >>> >>> > Best Regards, Edward J. Yoon
> >> >>> >>> >
> >> >>> >>> > -----Original Message-----
> >> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
> >> >>> >>> > To: user@hama.apache.org
> >> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>> >
> >> >>> >>> > To figure out the issue, I was trying something else and found
> >> out
> >> >>> >>> another
> >> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
> >> >>> following
> >> >>> >>> > lines give an exception.
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
> >> >>> function.*
> >> >>> >>> >
> >> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
> >> >>> >>> retrieved!*
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >> >>> >>> >
> >> >>> >>> > at
> >> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >> >>> >>> >
> >> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>>
> >> >>>
> >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >> >>> >>> >
> >> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> >
> >> >>> >>> > > I think I have more information on the issue. I did some
> >> >>> debugging and
> >> >>> >>> > > found something quite strange.
> >> >>> >>> > >
> >> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
> >> and
> >> >>> 3 task
> >> >>> >>> > > will be opened on other MACHINE2),
> >> >>> >>> > >
> >> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
> >> that
> >> >>> the
> >> >>> >>> > > processes do not even enter the SETUP function of BSP
> class. I
> >> >>> have
> >> >>> >>> print
> >> >>> >>> > > statements in the setup function of BSP class and it doesn't
> >> print
> >> >>> >>> > > anything. I get empty files with zero size.
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> >> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.log
> >> >>> >>> > >
> >> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
> >> class and
> >> >>> >>> prints
> >> >>> >>> > > stuff. See the size of files generated on output. How is it
> >> >>> possible
> >> >>> >>> that
> >> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> >> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.log
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 shows
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
> >> >>> >>> > >
> >> >>> >>> > > Any clue what might be going wrong ?
> >> >>> >>> > >
> >> >>> >>> > > Regards,
> >> >>> >>> > > Behroz
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > > wrote:
> >> >>> >>> > >
> >> >>> >>> > >> Here is the log file from that folder
> >> >>> >>> > >>
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader
> #1
> >> for
> >> >>> port
> >> >>> >>> > >> 61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
> >> starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
> >> BSPPeer
> >> >>> >>> > >> address:b178b33b16cc port:61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK
> >> Sync
> >> >>> Client
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
> >> >>> connecting
> >> >>> >>> to
> >> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> listener
> >> >>> on
> >> >>> >>> 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> Responder
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
> >> right
> >> >>> now.
> >> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> >>> >>> > >> job_201506262331_0003
> >> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 0
> >> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 2
> >> >>> >>> > >>
> >> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> >> >>> >>> edwardyoon@apache.org>
> >> >>> >>> > >> wrote:
> >> >>> >>> > >>
> >> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
> >> folder.
> >> >>> >>> > >>>
> >> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com
> >> >>> >>> >
> >> >>> >>> > >>> wrote:
> >> >>> >>> > >>> > Yea. I also thought that. I ran the program through
> >> eclipse
> >> >>> with 20
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> > and it works fine.
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> >> >>> >>> > edwardyoon@apache.org
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > wrote:
> >> >>> >>> > >>> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> It looks like a program bug. Have you ran your program
> in
> >> >>> local
> >> >>> >>> > mode?
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> >> >>> >>> > behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> > Hi,
> >> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
> >> and 3
> >> >>> are
> >> >>> >>> > >>> resolved
> >> >>> >>> > >>> >> but
> >> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > My problem:
> >> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of
> them
> >> >>> properly
> >> >>> >>> > >>> >> configured
> >> >>> >>> > >>> >> > (Apparently). From my master machine when I start
> >> Hadoop
> >> >>> and
> >> >>> >>> Hama,
> >> >>> >>> > >>> I can
> >> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
> >> check
> >> >>> the
> >> >>> >>> > >>> maximum
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
> >> each
> >> >>> >>> > machine).
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
> >> >>> sometimes
> >> >>> >>> > get
> >> >>> >>> > >>> an
> >> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
> >> >>> directory
> >> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Any help or clue would be great.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Regards,
> >> >>> >>> > >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> Thank you :)
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >> >>> >>> > >>> edwardyoon@apache.org
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> wrote:
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>> Hi,
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> You can get the maximum number of available tasks
> >> like
> >> >>> >>> following
> >> >>> >>> > >>> code:
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new
> BSPJobClient(conf);
> >> >>> >>> > >>> >> >>>     ClusterStatus cluster =
> >> >>> jobClient.getClusterStatus(true);
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     // Set to maximum
> >> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> wrote:
> >> >>> >>> > >>> >> >>> > Hi,
> >> >>> >>> > >>> >> >>> > 1) Thank you for this.
> >> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
> >> files
> >> >>> of PI
> >> >>> >>> > >>> example
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
> >> >>> submitted to
> >> >>> >>> > the
> >> >>> >>> > >>> job.
> >> >>> >>> > >>> >> >>> During
> >> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I
> am
> >> >>> looking
> >> >>> >>> > for
> >> >>> >>> > >>> >> >>> something
> >> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > Regards,
> >> >>> >>> > >>> >> >>> > Behroz
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon
> <
> >> >>> >>> > >>> >> edwardyoon@apache.org
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > wrote:
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >> Hello,
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
> >> configuration
> >> >>> >>> using
> >> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of
> >> course,
> >> >>> the
> >> >>> >>> > >>> fs.defaultFS
> >> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>   <property>
> >> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
> >> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/
> >> </value>
> >> >>> >>> > >>> >> >>> >>     <description>
> >> >>> >>> > >>> >> >>> >>       The name of the default file system.
> Either
> >> the
> >> >>> >>> literal
> >> >>> >>> > >>> string
> >> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
> >> >>> >>> > >>> >> >>> >>     </description>
> >> >>> >>> > >>> >> >>> >>   </property>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of
> tasks
> >> per
> >> >>> node.
> >> >>> >>> > It
> >> >>> >>> > >>> looks
> >> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi
> example
> >> >>> and look
> >> >>> >>> > at
> >> >>> >>> > >>> the
> >> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach
> >> the
> >> >>> images
> >> >>> >>> to
> >> >>> >>> > >>> >> mailing
> >> >>> >>> > >>> >> >>> >> list so I can't see it.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
> >> method.
> >> >>> If
> >> >>> >>> input
> >> >>> >>> > >>> is
> >> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
> >> driven
> >> >>> by
> >> >>> >>> the
> >> >>> >>> > >>> number
> >> >>> >>> > >>> >> of
> >> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
> >> >>> HAMA-956.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> Thanks!
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz
> Sikander <
> >> >>> >>> > >>> >> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> >> wrote:
> >> >>> >>> > >>> >> >>> >> > Hi,
> >> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup
> >> to a 2
> >> >>> >>> > machine
> >> >>> >>> > >>> >> setup.
> >> >>> >>> > >>> >> >>> I was
> >> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
> >> HDFS
> >> >>> to get
> >> >>> >>> > >>> data. I
> >> >>> >>> > >>> >> >>> have 3
> >> >>> >>> > >>> >> >>> >> > trivial questions
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the
> >> IP
> >> >>> address
> >> >>> >>> > of
> >> >>> >>> > >>> >> server
> >> >>> >>> > >>> >> >>> >> running
> >> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically
> >> pick
> >> >>> from
> >> >>> >>> the
> >> >>> >>> > >>> >> >>> configurations
> >> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
> >> >>> wrong.
> >> >>> >>> Right
> >> >>> >>> > >>> now my
> >> >>> >>> > >>> >> >>> code
> >> >>> >>> > >>> >> >>> >> work
> >> >>> >>> > >>> >> >>> >> > by using the following.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >> >>> >>> > >>> URI("hdfs://server_ip:port/"),
> >> >>> >>> > >>> >> >>> conf);
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
> >> >>> >>> automatically
> >> >>> >>> > >>> starts
> >> >>> >>> > >>> >> >>> hama in
> >> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
> >> slave
> >> >>> are
> >> >>> >>> set
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> >> groomservers.
> >> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
> >> which
> >> >>> >>> means
> >> >>> >>> > >>> that I
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> >>> >> open
> >> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit
> my
> >> jar
> >> >>> with
> >> >>> >>> 3
> >> >>> >>> > >>> bsp
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> >>> then
> >> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
> >> tasks,
> >> >>> Hama
> >> >>> >>> > >>> freezes.
> >> >>> >>> > >>> >> >>> Here is
> >> >>> >>> > >>> >> >>> >> the
> >> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
> >> slaves
> >> >>> but
> >> >>> >>> not
> >> >>> >>> > >>> on
> >> >>> >>> > >>> >> >>> master.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
> >> >>> property in
> >> >>> >>> > >>> >> >>> >> hama-default.xml
> >> >>> >>> > >>> >> >>> >> > to 4 but still same result.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many
> >> BSPPeerChild
> >> >>> >>> processes
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> possible.
> >> >>> >>> > >>> >> >>> >> Is
> >> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve
> that
> >> ?
> >> >>> Or hama
> >> >>> >>> > >>> picks up
> >> >>> >>> > >>> >> >>> the
> >> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Regards,
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> --
> >> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> --
> >> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> --
> >> >>> >>> > >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>> --
> >> >>> >>> > >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>>
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Best Regards, Edward J. Yoon
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
> >
> >
>
>
>




Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
I tried bin/stop-bspd.sh but the output of script says that no
groom/bspmaster process. Then I have to kill them manually. I am working on
Hama 0.7.0

On Mon, Aug 3, 2015 at 1:07 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Hi,
>
> Congratz! You can shutdown the cluster with following command: $
> bin/stop-bspd.sh
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Sunday, August 02, 2015 11:27 PM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> Hi,
> Last day, I got the fix for /etc/hosts file and now I can modify it. I
> tried to run  the cluster with 3 machines and everything went super fine.
>
> Thanks :)
>
> btw if I run a process using the following. How can I stop it ? Right now I
> am using kill -9 <process_id>
> % ./bin/hama bspmaster
>
> On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <be...@gmail.com>
> wrote:
>
> > Ok perfect. I do not have rights on /etc/hosts so that's why I was using
> > the IP addresses. I will talk to the administrator.
> >
> > Btw I am wondering, how PI example was able to communicate with the other
> > servers. PI examples runs fine even if I have tasks more than 3 (works on
> > both machines).
> >
> > On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <ed...@apache.org>
> > wrote:
> >
> >> OKay almost done. I guess you need to add host names to your
> >> /etc/hosts file. :-) Please see also
> >>
> >>
> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
> >>
> >> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> > Server 2 was showing the exception that I posted in the previous
> email.
> >> > Server1 is showing the following exception
> >> >
> >> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is
> >> added.
> >> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> >> > groomd_8d4b512cf448_50000
> >> > java.net.UnknownHostException: unknown host: 8d4b512cf448
> >> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> >> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> >> > at org.apache.hama.ipc.Client.call(Client.java:888)
> >> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> >> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
> >> >
> >> > I am looking into this issue.
> >> >
> >> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> >
> >> >> Ok great. I was able to run the zk, groom and bspmaster on server 1.
> >> But
> >> >> when I ran the groom on server2 I got the following exception
> >> >>
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
> >> >> establishing communication link with BSPMaster
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
> >> >> reinitializing GroomServer: java.io.IOException: There is a problem
> in
> >> >> establishing communication link with BSPMaster.
> >> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
> >> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
> >> >> at java.lang.Thread.run(Thread.java:745)
> >> >>
> >> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >> wrote:
> >> >>
> >> >>> Here's my configurations:
> >> >>>
> >> >>> hama-site.xml:
> >> >>>
> >> >>>   <property>
> >> >>>     <name>bsp.master.address</name>
> >> >>>     <value>cluster-0:40000</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>fs.default.name</name>
> >> >>>     <value>hdfs://cluster-0:9000/</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>hama.zookeeper.quorum</name>
> >> >>>     <value>cluster-0</value>
> >> >>>   </property>
> >> >>>
> >> >>>
> >> >>> % bin/hama zookeeper
> >> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
> >> >>> configuration, only one server specified (ignoring)
> >> >>>
> >> >>> Then, open new terminal and run master with following command:
> >> >>>
> >> >>> % bin/hama bspmaster
> >> >>> ...
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK
> >> false
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync
> Client
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
> >> edwardyoon@apache.org>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > If you run zk server too, BSPmaster will be connected to zk and
> >> won't
> >> >>> > throw exceptions.
> >> >>> >
> >> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
> >> behroz89@gmail.com>
> >> >>> wrote:
> >> >>> >> Hi,
> >> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
> >> the
> >> >>> same
> >> >>> >> problem.
> >> >>> >> When I run % bin/hama bspmaster, I am getting the following
> >> exception
> >> >>> >>
> >> >>> >> INFO http.HttpServer: Port returned by
> >> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
> >> >>> Opening
> >> >>> >> the listener on 40013
> >> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> >> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
> >> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
> >> >>> >>  INFO mortbay.log: jetty-6.1.14
> >> >>> >>  INFO mortbay.log: Extract
> >> >>> >>
> >> >>>
> >>
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> >> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
> >> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
> >> :40013
> >> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
> >> >>> >>  INFO bsp.BSPMaster: hdfs://
> >> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
> >> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >> >>> >> at
> >> >>> >>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> >> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >>
> >> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am
> using
> >> >>> just
> >> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
> >> >>> >> <property>
> >> >>> >>                  <name>hama.zookeeper.quorum</name>
> >> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
> >> >>> >>                  <description>Comma separated list of servers in
> >> the
> >> >>> >> ZooKeeper quorum.
> >> >>> >>                  For example, "host1.mydomain.com,
> >> host2.mydomain.com,
> >> >>> >> host3.mydomain.com".
> >> >>> >>                  By default this is set to localhost for local
> and
> >> >>> >> pseudo-distributed modes
> >> >>> >>                  of operation. For a fully-distributed setup,
> this
> >> >>> should
> >> >>> >> be set to a full
> >> >>> >>                  list of ZooKeeper quorum servers. If
> >> HAMA_MANAGES_ZK
> >> >>> is
> >> >>> >> set in hama-env.sh
> >> >>> >>                  this is the list of servers which we will
> >> start/stop
> >> >>> >> ZooKeeper on.
> >> >>> >>                  </description>
> >> >>> >>         </property>
> >> >>> >>        ......
> >> >>> >>        <property>
> >> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
> >> >>> >>                  <value>2181</value>
> >> >>> >>          </property>
> >> >>> >>
> >> >>> >> Is something wrong with my settings ?
> >> >>> >>
> >> >>> >> Regards,
> >> >>> >> Behroz Sikander
> >> >>> >>
> >> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> >>> configurations
> >> >>> >>>
> >> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
> >> Yarn
> >> >>> >>> configuration is only needed when you want to submit a BSP job
> to
> >> Yarn
> >> >>> >>> cluster
> >> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
> >> >>> >>>
> >> >>> >>> > distributed mode ? and is there any way to manage the server
> ? I
> >> >>> mean
> >> >>> >>> right
> >> >>> >>> > now, I have 3 machines with alot of configurations files and
> log
> >> >>> files.
> >> >>> >>> It
> >> >>> >>>
> >> >>> >>> You can use web UI at
> >> http://masterserver_address:40013/bspmaster.jsp
> >> >>> >>>
> >> >>> >>> To debug your program, please try like below:
> >> >>> >>>
> >> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
> >> >>> >>> % bin/hama bspmaster
> >> >>> >>> % bin/hama zookeeper
> >> >>> >>>
> >> >>> >>> 2) Run a Groom at server1 and server2.
> >> >>> >>>
> >> >>> >>> % bin/hama groom
> >> >>> >>>
> >> >>> >>> 3) Check whether deamons are running well. Then, run your
> program
> >> >>> using jar
> >> >>> >>> command at server1.
> >> >>> >>>
> >> >>> >>> % bin/hama jar .....
> >> >>> >>>
> >> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
> >> >>> exception. But
> >> >>> >>> > this occurs in both cases when I run my job with 3 tasks or
> >> with 4
> >> >>> tasks
> >> >>> >>>
> >> >>> >>> In fact, you should not see above initZK error log.
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> Best Regards, Edward J. Yoon
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> -----Original Message-----
> >> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
> >> >>> >>> To: user@hama.apache.org
> >> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>>
> >> >>> >>> I will try the things that you mentioned. I am not using the
> >> latest
> >> >>> version
> >> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> configurations
> >> >>> >>> which makes it more harder for me to understand when things go
> >> wrong.
> >> >>> Any
> >> >>> >>> suggestions ?
> >> >>> >>>
> >> >>> >>> Further, are there any tools that you use for debugging while in
> >> >>> >>> distributed mode ? and is there any way to manage the server ? I
> >> mean
> >> >>> right
> >> >>> >>> now, I have 3 machines with alot of configurations files and log
> >> >>> files. It
> >> >>> >>> takes alot of time. This makes me wonder how people who have
> 100s
> >> of
> >> >>> >>> machines debug and manage the cluster.
> >> >>> >>>
> >> >>> >>> Regards,
> >> >>> >>> Behroz
> >> >>> >>>
> >> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > Hi,
> >> >>> >>> >
> >> >>> >>> > It looks like a zookeeper connection problem. Please check
> >> whether
> >> >>> >>> > zookeeper
> >> >>> >>> > is running and every tasks can connect to zookeeper.
> >> >>> >>> >
> >> >>> >>> > I would recommend you to stop the firewall during debugging,
> and
> >> >>> please
> >> >>> >>> use
> >> >>> >>> > the 0.7.0 latest release.
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > --
> >> >>> >>> > Best Regards, Edward J. Yoon
> >> >>> >>> >
> >> >>> >>> > -----Original Message-----
> >> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
> >> >>> >>> > To: user@hama.apache.org
> >> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>> >
> >> >>> >>> > To figure out the issue, I was trying something else and found
> >> out
> >> >>> >>> another
> >> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
> >> >>> following
> >> >>> >>> > lines give an exception.
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
> >> >>> function.*
> >> >>> >>> >
> >> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
> >> >>> >>> retrieved!*
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >> >>> >>> >
> >> >>> >>> > at
> >> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >> >>> >>> >
> >> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>>
> >> >>>
> >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >> >>> >>> >
> >> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> >
> >> >>> >>> > > I think I have more information on the issue. I did some
> >> >>> debugging and
> >> >>> >>> > > found something quite strange.
> >> >>> >>> > >
> >> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
> >> and
> >> >>> 3 task
> >> >>> >>> > > will be opened on other MACHINE2),
> >> >>> >>> > >
> >> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
> >> that
> >> >>> the
> >> >>> >>> > > processes do not even enter the SETUP function of BSP
> class. I
> >> >>> have
> >> >>> >>> print
> >> >>> >>> > > statements in the setup function of BSP class and it doesn't
> >> print
> >> >>> >>> > > anything. I get empty files with zero size.
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> >> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.log
> >> >>> >>> > >
> >> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
> >> class and
> >> >>> >>> prints
> >> >>> >>> > > stuff. See the size of files generated on output. How is it
> >> >>> possible
> >> >>> >>> that
> >> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> >> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.log
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 shows
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
> >> >>> >>> > >
> >> >>> >>> > > Any clue what might be going wrong ?
> >> >>> >>> > >
> >> >>> >>> > > Regards,
> >> >>> >>> > > Behroz
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > > wrote:
> >> >>> >>> > >
> >> >>> >>> > >> Here is the log file from that folder
> >> >>> >>> > >>
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader
> #1
> >> for
> >> >>> port
> >> >>> >>> > >> 61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
> >> starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
> >> BSPPeer
> >> >>> >>> > >> address:b178b33b16cc port:61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK
> >> Sync
> >> >>> Client
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
> >> >>> connecting
> >> >>> >>> to
> >> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> listener
> >> >>> on
> >> >>> >>> 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> Responder
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
> >> right
> >> >>> now.
> >> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> >>> >>> > >> job_201506262331_0003
> >> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 0
> >> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 2
> >> >>> >>> > >>
> >> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> >> >>> >>> edwardyoon@apache.org>
> >> >>> >>> > >> wrote:
> >> >>> >>> > >>
> >> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
> >> folder.
> >> >>> >>> > >>>
> >> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com
> >> >>> >>> >
> >> >>> >>> > >>> wrote:
> >> >>> >>> > >>> > Yea. I also thought that. I ran the program through
> >> eclipse
> >> >>> with 20
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> > and it works fine.
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> >> >>> >>> > edwardyoon@apache.org
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > wrote:
> >> >>> >>> > >>> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> It looks like a program bug. Have you ran your program
> in
> >> >>> local
> >> >>> >>> > mode?
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> >> >>> >>> > behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> > Hi,
> >> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
> >> and 3
> >> >>> are
> >> >>> >>> > >>> resolved
> >> >>> >>> > >>> >> but
> >> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > My problem:
> >> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of
> them
> >> >>> properly
> >> >>> >>> > >>> >> configured
> >> >>> >>> > >>> >> > (Apparently). From my master machine when I start
> >> Hadoop
> >> >>> and
> >> >>> >>> Hama,
> >> >>> >>> > >>> I can
> >> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
> >> check
> >> >>> the
> >> >>> >>> > >>> maximum
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
> >> each
> >> >>> >>> > machine).
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
> >> >>> sometimes
> >> >>> >>> > get
> >> >>> >>> > >>> an
> >> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
> >> >>> directory
> >> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Any help or clue would be great.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Regards,
> >> >>> >>> > >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> Thank you :)
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >> >>> >>> > >>> edwardyoon@apache.org
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> wrote:
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>> Hi,
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> You can get the maximum number of available tasks
> >> like
> >> >>> >>> following
> >> >>> >>> > >>> code:
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new
> BSPJobClient(conf);
> >> >>> >>> > >>> >> >>>     ClusterStatus cluster =
> >> >>> jobClient.getClusterStatus(true);
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     // Set to maximum
> >> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> wrote:
> >> >>> >>> > >>> >> >>> > Hi,
> >> >>> >>> > >>> >> >>> > 1) Thank you for this.
> >> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
> >> files
> >> >>> of PI
> >> >>> >>> > >>> example
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
> >> >>> submitted to
> >> >>> >>> > the
> >> >>> >>> > >>> job.
> >> >>> >>> > >>> >> >>> During
> >> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I
> am
> >> >>> looking
> >> >>> >>> > for
> >> >>> >>> > >>> >> >>> something
> >> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > Regards,
> >> >>> >>> > >>> >> >>> > Behroz
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon
> <
> >> >>> >>> > >>> >> edwardyoon@apache.org
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > wrote:
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >> Hello,
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
> >> configuration
> >> >>> >>> using
> >> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of
> >> course,
> >> >>> the
> >> >>> >>> > >>> fs.defaultFS
> >> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>   <property>
> >> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
> >> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/
> >> </value>
> >> >>> >>> > >>> >> >>> >>     <description>
> >> >>> >>> > >>> >> >>> >>       The name of the default file system.
> Either
> >> the
> >> >>> >>> literal
> >> >>> >>> > >>> string
> >> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
> >> >>> >>> > >>> >> >>> >>     </description>
> >> >>> >>> > >>> >> >>> >>   </property>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of
> tasks
> >> per
> >> >>> node.
> >> >>> >>> > It
> >> >>> >>> > >>> looks
> >> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi
> example
> >> >>> and look
> >> >>> >>> > at
> >> >>> >>> > >>> the
> >> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach
> >> the
> >> >>> images
> >> >>> >>> to
> >> >>> >>> > >>> >> mailing
> >> >>> >>> > >>> >> >>> >> list so I can't see it.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
> >> method.
> >> >>> If
> >> >>> >>> input
> >> >>> >>> > >>> is
> >> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
> >> driven
> >> >>> by
> >> >>> >>> the
> >> >>> >>> > >>> number
> >> >>> >>> > >>> >> of
> >> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
> >> >>> HAMA-956.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> Thanks!
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz
> Sikander <
> >> >>> >>> > >>> >> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> >> wrote:
> >> >>> >>> > >>> >> >>> >> > Hi,
> >> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup
> >> to a 2
> >> >>> >>> > machine
> >> >>> >>> > >>> >> setup.
> >> >>> >>> > >>> >> >>> I was
> >> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
> >> HDFS
> >> >>> to get
> >> >>> >>> > >>> data. I
> >> >>> >>> > >>> >> >>> have 3
> >> >>> >>> > >>> >> >>> >> > trivial questions
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the
> >> IP
> >> >>> address
> >> >>> >>> > of
> >> >>> >>> > >>> >> server
> >> >>> >>> > >>> >> >>> >> running
> >> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically
> >> pick
> >> >>> from
> >> >>> >>> the
> >> >>> >>> > >>> >> >>> configurations
> >> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
> >> >>> wrong.
> >> >>> >>> Right
> >> >>> >>> > >>> now my
> >> >>> >>> > >>> >> >>> code
> >> >>> >>> > >>> >> >>> >> work
> >> >>> >>> > >>> >> >>> >> > by using the following.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >> >>> >>> > >>> URI("hdfs://server_ip:port/"),
> >> >>> >>> > >>> >> >>> conf);
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
> >> >>> >>> automatically
> >> >>> >>> > >>> starts
> >> >>> >>> > >>> >> >>> hama in
> >> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
> >> slave
> >> >>> are
> >> >>> >>> set
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> >> groomservers.
> >> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
> >> which
> >> >>> >>> means
> >> >>> >>> > >>> that I
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> >>> >> open
> >> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit
> my
> >> jar
> >> >>> with
> >> >>> >>> 3
> >> >>> >>> > >>> bsp
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> >>> then
> >> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
> >> tasks,
> >> >>> Hama
> >> >>> >>> > >>> freezes.
> >> >>> >>> > >>> >> >>> Here is
> >> >>> >>> > >>> >> >>> >> the
> >> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
> >> slaves
> >> >>> but
> >> >>> >>> not
> >> >>> >>> > >>> on
> >> >>> >>> > >>> >> >>> master.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
> >> >>> property in
> >> >>> >>> > >>> >> >>> >> hama-default.xml
> >> >>> >>> > >>> >> >>> >> > to 4 but still same result.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many
> >> BSPPeerChild
> >> >>> >>> processes
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> possible.
> >> >>> >>> > >>> >> >>> >> Is
> >> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve
> that
> >> ?
> >> >>> Or hama
> >> >>> >>> > >>> picks up
> >> >>> >>> > >>> >> >>> the
> >> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Regards,
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> --
> >> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> --
> >> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> --
> >> >>> >>> > >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>> --
> >> >>> >>> > >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>>
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Best Regards, Edward J. Yoon
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
> >
> >
>
>
>

RE: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@samsung.com>.
Hi,

Congratz! You can shutdown the cluster with following command: $ 
bin/stop-bspd.sh

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Sunday, August 02, 2015 11:27 PM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

Hi,
Last day, I got the fix for /etc/hosts file and now I can modify it. I
tried to run  the cluster with 3 machines and everything went super fine.

Thanks :)

btw if I run a process using the following. How can I stop it ? Right now I
am using kill -9 <process_id>
% ./bin/hama bspmaster

On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <be...@gmail.com> wrote:

> Ok perfect. I do not have rights on /etc/hosts so that's why I was using
> the IP addresses. I will talk to the administrator.
>
> Btw I am wondering, how PI example was able to communicate with the other
> servers. PI examples runs fine even if I have tasks more than 3 (works on
> both machines).
>
> On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> OKay almost done. I guess you need to add host names to your
>> /etc/hosts file. :-) Please see also
>>
>> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
>>
>> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Server 2 was showing the exception that I posted in the previous email.
>> > Server1 is showing the following exception
>> >
>> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
>> starting
>> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is
>> added.
>> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
>> > groomd_8d4b512cf448_50000
>> > java.net.UnknownHostException: unknown host: 8d4b512cf448
>> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
>> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
>> > at org.apache.hama.ipc.Client.call(Client.java:888)
>> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
>> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
>> >
>> > I am looking into this issue.
>> >
>> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> >
>> >> Ok great. I was able to run the zk, groom and bspmaster on server 1.
>> But
>> >> when I ran the groom on server2 I got the following exception
>> >>
>> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
>> >> establishing communication link with BSPMaster
>> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
>> >> reinitializing GroomServer: java.io.IOException: There is a problem in
>> >> establishing communication link with BSPMaster.
>> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
>> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
>> >> at java.lang.Thread.run(Thread.java:745)
>> >>
>> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <edwardyoon@apache.org
>> >
>> >> wrote:
>> >>
>> >>> Here's my configurations:
>> >>>
>> >>> hama-site.xml:
>> >>>
>> >>>   <property>
>> >>>     <name>bsp.master.address</name>
>> >>>     <value>cluster-0:40000</value>
>> >>>   </property>
>> >>>
>> >>>   <property>
>> >>>     <name>fs.default.name</name>
>> >>>     <value>hdfs://cluster-0:9000/</value>
>> >>>   </property>
>> >>>
>> >>>   <property>
>> >>>     <name>hama.zookeeper.quorum</name>
>> >>>     <value>cluster-0</value>
>> >>>   </property>
>> >>>
>> >>>
>> >>> % bin/hama zookeeper
>> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
>> >>> configuration, only one server specified (ignoring)
>> >>>
>> >>> Then, open new terminal and run master with following command:
>> >>>
>> >>> % bin/hama bspmaster
>> >>> ...
>> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK
>> false
>> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
>> starting
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
>> starting
>> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > If you run zk server too, BSPmaster will be connected to zk and
>> won't
>> >>> > throw exceptions.
>> >>> >
>> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>> wrote:
>> >>> >> Hi,
>> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
>> the
>> >>> same
>> >>> >> problem.
>> >>> >> When I run % bin/hama bspmaster, I am getting the following
>> exception
>> >>> >>
>> >>> >> INFO http.HttpServer: Port returned by
>> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
>> >>> Opening
>> >>> >> the listener on 40013
>> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
>> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
>> >>> >>  INFO mortbay.log: jetty-6.1.14
>> >>> >>  INFO mortbay.log: Extract
>> >>> >>
>> >>>
>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
>> :40013
>> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
>> >>> >>  INFO bsp.BSPMaster: hdfs://
>> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
>> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >>> >> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> >>> >> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>> >>> >> at
>> >>> >>
>> >>>
>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >>> >>
>> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
>> >>> just
>> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
>> >>> >> <property>
>> >>> >>                  <name>hama.zookeeper.quorum</name>
>> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
>> >>> >>                  <description>Comma separated list of servers in
>> the
>> >>> >> ZooKeeper quorum.
>> >>> >>                  For example, "host1.mydomain.com,
>> host2.mydomain.com,
>> >>> >> host3.mydomain.com".
>> >>> >>                  By default this is set to localhost for local and
>> >>> >> pseudo-distributed modes
>> >>> >>                  of operation. For a fully-distributed setup, this
>> >>> should
>> >>> >> be set to a full
>> >>> >>                  list of ZooKeeper quorum servers. If
>> HAMA_MANAGES_ZK
>> >>> is
>> >>> >> set in hama-env.sh
>> >>> >>                  this is the list of servers which we will
>> start/stop
>> >>> >> ZooKeeper on.
>> >>> >>                  </description>
>> >>> >>         </property>
>> >>> >>        ......
>> >>> >>        <property>
>> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
>> >>> >>                  <value>2181</value>
>> >>> >>          </property>
>> >>> >>
>> >>> >> Is something wrong with my settings ?
>> >>> >>
>> >>> >> Regards,
>> >>> >> Behroz Sikander
>> >>> >>
>> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
>> >>> edward.yoon@samsung.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
>> >>> >>> configurations
>> >>> >>>
>> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
>> Yarn
>> >>> >>> configuration is only needed when you want to submit a BSP job to
>> Yarn
>> >>> >>> cluster
>> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
>> >>> >>>
>> >>> >>> > distributed mode ? and is there any way to manage the server ? I
>> >>> mean
>> >>> >>> right
>> >>> >>> > now, I have 3 machines with alot of configurations files and log
>> >>> files.
>> >>> >>> It
>> >>> >>>
>> >>> >>> You can use web UI at
>> http://masterserver_address:40013/bspmaster.jsp
>> >>> >>>
>> >>> >>> To debug your program, please try like below:
>> >>> >>>
>> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
>> >>> >>> % bin/hama bspmaster
>> >>> >>> % bin/hama zookeeper
>> >>> >>>
>> >>> >>> 2) Run a Groom at server1 and server2.
>> >>> >>>
>> >>> >>> % bin/hama groom
>> >>> >>>
>> >>> >>> 3) Check whether deamons are running well. Then, run your program
>> >>> using jar
>> >>> >>> command at server1.
>> >>> >>>
>> >>> >>> % bin/hama jar .....
>> >>> >>>
>> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
>> >>> exception. But
>> >>> >>> > this occurs in both cases when I run my job with 3 tasks or
>> with 4
>> >>> tasks
>> >>> >>>
>> >>> >>> In fact, you should not see above initZK error log.
>> >>> >>>
>> >>> >>> --
>> >>> >>> Best Regards, Edward J. Yoon
>> >>> >>>
>> >>> >>>
>> >>> >>> -----Original Message-----
>> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
>> >>> >>> To: user@hama.apache.org
>> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
>> >>> >>>
>> >>> >>> I will try the things that you mentioned. I am not using the
>> latest
>> >>> version
>> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
>> >>> configurations
>> >>> >>> which makes it more harder for me to understand when things go
>> wrong.
>> >>> Any
>> >>> >>> suggestions ?
>> >>> >>>
>> >>> >>> Further, are there any tools that you use for debugging while in
>> >>> >>> distributed mode ? and is there any way to manage the server ? I
>> mean
>> >>> right
>> >>> >>> now, I have 3 machines with alot of configurations files and log
>> >>> files. It
>> >>> >>> takes alot of time. This makes me wonder how people who have 100s
>> of
>> >>> >>> machines debug and manage the cluster.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>> Behroz
>> >>> >>>
>> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
>> >>> edward.yoon@samsung.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > Hi,
>> >>> >>> >
>> >>> >>> > It looks like a zookeeper connection problem. Please check
>> whether
>> >>> >>> > zookeeper
>> >>> >>> > is running and every tasks can connect to zookeeper.
>> >>> >>> >
>> >>> >>> > I would recommend you to stop the firewall during debugging, and
>> >>> please
>> >>> >>> use
>> >>> >>> > the 0.7.0 latest release.
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > Best Regards, Edward J. Yoon
>> >>> >>> >
>> >>> >>> > -----Original Message-----
>> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
>> >>> >>> > To: user@hama.apache.org
>> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
>> >>> >>> >
>> >>> >>> > To figure out the issue, I was trying something else and found
>> out
>> >>> >>> another
>> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
>> >>> following
>> >>> >>> > lines give an exception.
>> >>> >>> >
>> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
>> >>> >>> >
>> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
>> >>> function.*
>> >>> >>> >
>> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
>> >>> >>> retrieved!*
>> >>> >>> >
>> >>> >>> > at
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>>
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>> >>> >>> >
>> >>> >>> > at
>> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>> >>> >>> >
>> >>> >>> > at
>> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>> >>> >>> >
>> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>> >>> >>> >
>> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>> >>> >>> >
>> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>> >>> >>> >
>> >>> >>> > at
>> >>> >>>
>> >>>
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>> >>> >>> >
>> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
>> >>> behroz89@gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >
>> >>> >>> > > I think I have more information on the issue. I did some
>> >>> debugging and
>> >>> >>> > > found something quite strange.
>> >>> >>> > >
>> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
>> and
>> >>> 3 task
>> >>> >>> > > will be opened on other MACHINE2),
>> >>> >>> > >
>> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
>> that
>> >>> the
>> >>> >>> > > processes do not even enter the SETUP function of BSP class. I
>> >>> have
>> >>> >>> print
>> >>> >>> > > statements in the setup function of BSP class and it doesn't
>> print
>> >>> >>> > > anything. I get empty files with zero size.
>> >>> >>> > >
>> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000000_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000000_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000001_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000001_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000002_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000002_0.log
>> >>> >>> > >
>> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
>> class and
>> >>> >>> prints
>> >>> >>> > > stuff. See the size of files generated on output. How is it
>> >>> possible
>> >>> >>> that
>> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>> >>> >>> > >
>> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000003_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000003_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000004_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000004_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000005_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000005_0.log
>> >>> >>> > >
>> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
>> >>> >>> > >
>> >>> >>> > > - Hama Groom log file on MACHINE2 shows
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
>> >>> >>> > >
>> >>> >>> > > Any clue what might be going wrong ?
>> >>> >>> > >
>> >>> >>> > > Regards,
>> >>> >>> > > Behroz
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
>> >>> behroz89@gmail.com>
>> >>> >>> > > wrote:
>> >>> >>> > >
>> >>> >>> > >> Here is the log file from that folder
>> >>> >>> > >>
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1
>> for
>> >>> port
>> >>> >>> > >> 61001
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
>> starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
>> BSPPeer
>> >>> >>> > >> address:b178b33b16cc port:61001
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK
>> Sync
>> >>> Client
>> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
>> >>> connecting
>> >>> >>> to
>> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
>> listener
>> >>> on
>> >>> >>> 61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
>> Responder
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
>> right
>> >>> now.
>> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> >>> >>> > >> job_201506262331_0003
>> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
>> >>> number: 0
>> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
>> >>> number: 2
>> >>> >>> > >>
>> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>> >>> >>> edwardyoon@apache.org>
>> >>> >>> > >> wrote:
>> >>> >>> > >>
>> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
>> folder.
>> >>> >>> > >>>
>> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
>> >>> behroz89@gmail.com
>> >>> >>> >
>> >>> >>> > >>> wrote:
>> >>> >>> > >>> > Yea. I also thought that. I ran the program through
>> eclipse
>> >>> with 20
>> >>> >>> > >>> tasks
>> >>> >>> > >>> > and it works fine.
>> >>> >>> > >>> >
>> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>> >>> >>> > edwardyoon@apache.org
>> >>> >>> > >>> >
>> >>> >>> > >>> > wrote:
>> >>> >>> > >>> >
>> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
>> fine.
>> >>> When I
>> >>> >>> > >>> run my
>> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> >>> increase
>> >>> >>> > the
>> >>> >>> > >>> tasks
>> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> >>> understand
>> >>> >>> > >>> what
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> > go wrong.
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> It looks like a program bug. Have you ran your program in
>> >>> local
>> >>> >>> > mode?
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>> >>> >>> > behroz89@gmail.com>
>> >>> >>> > >>> >> wrote:
>> >>> >>> > >>> >> > Hi,
>> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
>> and 3
>> >>> are
>> >>> >>> > >>> resolved
>> >>> >>> > >>> >> but
>> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > My problem:
>> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
>> >>> properly
>> >>> >>> > >>> >> configured
>> >>> >>> > >>> >> > (Apparently). From my master machine when I start
>> Hadoop
>> >>> and
>> >>> >>> Hama,
>> >>> >>> > >>> I can
>> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
>> check
>> >>> the
>> >>> >>> > >>> maximum
>> >>> >>> > >>> >> tasks
>> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
>> each
>> >>> >>> > machine).
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
>> fine.
>> >>> When I
>> >>> >>> > >>> run my
>> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> >>> increase
>> >>> >>> > the
>> >>> >>> > >>> tasks
>> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> >>> understand
>> >>> >>> > >>> what
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> > go wrong.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
>> >>> sometimes
>> >>> >>> > get
>> >>> >>> > >>> an
>> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
>> >>> directory
>> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > Any help or clue would be great.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > Regards,
>> >>> >>> > >>> >> > Behroz Sikander
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>> >>> >>> > >>> behroz89@gmail.com>
>> >>> >>> > >>> >> wrote:
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> >> Thank you :)
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>> >>> >>> > >>> edwardyoon@apache.org
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> >> wrote:
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >>> Hi,
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> You can get the maximum number of available tasks
>> like
>> >>> >>> following
>> >>> >>> > >>> code:
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> >>> >>> > >>> >> >>>     ClusterStatus cluster =
>> >>> jobClient.getClusterStatus(true);
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>     // Set to maximum
>> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>> >>> >>> > >>> behroz89@gmail.com>
>> >>> >>> > >>> >> >>> wrote:
>> >>> >>> > >>> >> >>> > Hi,
>> >>> >>> > >>> >> >>> > 1) Thank you for this.
>> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
>> files
>> >>> of PI
>> >>> >>> > >>> example
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> >
>> >>> >>>
>> >>>
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> >
>> >>> >>>
>> >>>
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
>> >>> submitted to
>> >>> >>> > the
>> >>> >>> > >>> job.
>> >>> >>> > >>> >> >>> During
>> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
>> >>> looking
>> >>> >>> > for
>> >>> >>> > >>> >> >>> something
>> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > Regards,
>> >>> >>> > >>> >> >>> > Behroz
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> >>> >>> > >>> >> edwardyoon@apache.org
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > wrote:
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >> Hello,
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
>> configuration
>> >>> >>> using
>> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of
>> course,
>> >>> the
>> >>> >>> > >>> fs.defaultFS
>> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>   <property>
>> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/
>> </value>
>> >>> >>> > >>> >> >>> >>     <description>
>> >>> >>> > >>> >> >>> >>       The name of the default file system. Either
>> the
>> >>> >>> literal
>> >>> >>> > >>> string
>> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>> >>> >>> > >>> >> >>> >>     </description>
>> >>> >>> > >>> >> >>> >>   </property>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks
>> per
>> >>> node.
>> >>> >>> > It
>> >>> >>> > >>> looks
>> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example
>> >>> and look
>> >>> >>> > at
>> >>> >>> > >>> the
>> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach
>> the
>> >>> images
>> >>> >>> to
>> >>> >>> > >>> >> mailing
>> >>> >>> > >>> >> >>> >> list so I can't see it.
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
>> method.
>> >>> If
>> >>> >>> input
>> >>> >>> > >>> is
>> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
>> driven
>> >>> by
>> >>> >>> the
>> >>> >>> > >>> number
>> >>> >>> > >>> >> of
>> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
>> >>> HAMA-956.
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> Thanks!
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> >>> >>> > >>> >> behroz89@gmail.com>
>> >>> >>> > >>> >> >>> >> wrote:
>> >>> >>> > >>> >> >>> >> > Hi,
>> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup
>> to a 2
>> >>> >>> > machine
>> >>> >>> > >>> >> setup.
>> >>> >>> > >>> >> >>> I was
>> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
>> HDFS
>> >>> to get
>> >>> >>> > >>> data. I
>> >>> >>> > >>> >> >>> have 3
>> >>> >>> > >>> >> >>> >> > trivial questions
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the
>> IP
>> >>> address
>> >>> >>> > of
>> >>> >>> > >>> >> server
>> >>> >>> > >>> >> >>> >> running
>> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically
>> pick
>> >>> from
>> >>> >>> the
>> >>> >>> > >>> >> >>> configurations
>> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
>> >>> wrong.
>> >>> >>> Right
>> >>> >>> > >>> now my
>> >>> >>> > >>> >> >>> code
>> >>> >>> > >>> >> >>> >> work
>> >>> >>> > >>> >> >>> >> > by using the following.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>> >>> >>> > >>> URI("hdfs://server_ip:port/"),
>> >>> >>> > >>> >> >>> conf);
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>> >>> >>> automatically
>> >>> >>> > >>> starts
>> >>> >>> > >>> >> >>> hama in
>> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
>> slave
>> >>> are
>> >>> >>> set
>> >>> >>> > >>> as
>> >>> >>> > >>> >> >>> >> groomservers.
>> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
>> which
>> >>> >>> means
>> >>> >>> > >>> that I
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> >>> >> open
>> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my
>> jar
>> >>> with
>> >>> >>> 3
>> >>> >>> > >>> bsp
>> >>> >>> > >>> >> tasks
>> >>> >>> > >>> >> >>> then
>> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
>> tasks,
>> >>> Hama
>> >>> >>> > >>> freezes.
>> >>> >>> > >>> >> >>> Here is
>> >>> >>> > >>> >> >>> >> the
>> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
>> slaves
>> >>> but
>> >>> >>> not
>> >>> >>> > >>> on
>> >>> >>> > >>> >> >>> master.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
>> >>> property in
>> >>> >>> > >>> >> >>> >> hama-default.xml
>> >>> >>> > >>> >> >>> >> > to 4 but still same result.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many
>> BSPPeerChild
>> >>> >>> processes
>> >>> >>> > >>> as
>> >>> >>> > >>> >> >>> possible.
>> >>> >>> > >>> >> >>> >> Is
>> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that
>> ?
>> >>> Or hama
>> >>> >>> > >>> picks up
>> >>> >>> > >>> >> >>> the
>> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Regards,
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Behroz Sikander
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> --
>> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> --
>> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> --
>> >>> >>> > >>> >> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> > >>>
>> >>> >>> > >>>
>> >>> >>> > >>> --
>> >>> >>> > >>> Best Regards, Edward J. Yoon
>> >>> >>> > >>>
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> > >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Best Regards, Edward J. Yoon
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>



Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Hi,
Last day, I got the fix for /etc/hosts file and now I can modify it. I
tried to run  the cluster with 3 machines and everything went super fine.

Thanks :)

btw if I run a process using the following. How can I stop it ? Right now I
am using kill -9 <process_id>
% ./bin/hama bspmaster

On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <be...@gmail.com> wrote:

> Ok perfect. I do not have rights on /etc/hosts so that's why I was using
> the IP addresses. I will talk to the administrator.
>
> Btw I am wondering, how PI example was able to communicate with the other
> servers. PI examples runs fine even if I have tasks more than 3 (works on
> both machines).
>
> On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> OKay almost done. I guess you need to add host names to your
>> /etc/hosts file. :-) Please see also
>>
>> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
>>
>> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Server 2 was showing the exception that I posted in the previous email.
>> > Server1 is showing the following exception
>> >
>> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
>> starting
>> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is
>> added.
>> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
>> > groomd_8d4b512cf448_50000
>> > java.net.UnknownHostException: unknown host: 8d4b512cf448
>> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
>> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
>> > at org.apache.hama.ipc.Client.call(Client.java:888)
>> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
>> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
>> >
>> > I am looking into this issue.
>> >
>> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> >
>> >> Ok great. I was able to run the zk, groom and bspmaster on server 1.
>> But
>> >> when I ran the groom on server2 I got the following exception
>> >>
>> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
>> >> establishing communication link with BSPMaster
>> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
>> >> reinitializing GroomServer: java.io.IOException: There is a problem in
>> >> establishing communication link with BSPMaster.
>> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
>> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
>> >> at java.lang.Thread.run(Thread.java:745)
>> >>
>> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <edwardyoon@apache.org
>> >
>> >> wrote:
>> >>
>> >>> Here's my configurations:
>> >>>
>> >>> hama-site.xml:
>> >>>
>> >>>   <property>
>> >>>     <name>bsp.master.address</name>
>> >>>     <value>cluster-0:40000</value>
>> >>>   </property>
>> >>>
>> >>>   <property>
>> >>>     <name>fs.default.name</name>
>> >>>     <value>hdfs://cluster-0:9000/</value>
>> >>>   </property>
>> >>>
>> >>>   <property>
>> >>>     <name>hama.zookeeper.quorum</name>
>> >>>     <value>cluster-0</value>
>> >>>   </property>
>> >>>
>> >>>
>> >>> % bin/hama zookeeper
>> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
>> >>> configuration, only one server specified (ignoring)
>> >>>
>> >>> Then, open new terminal and run master with following command:
>> >>>
>> >>> % bin/hama bspmaster
>> >>> ...
>> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK
>> false
>> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
>> starting
>> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
>> starting
>> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > If you run zk server too, BSPmaster will be connected to zk and
>> won't
>> >>> > throw exceptions.
>> >>> >
>> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>> wrote:
>> >>> >> Hi,
>> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
>> the
>> >>> same
>> >>> >> problem.
>> >>> >> When I run % bin/hama bspmaster, I am getting the following
>> exception
>> >>> >>
>> >>> >> INFO http.HttpServer: Port returned by
>> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
>> >>> Opening
>> >>> >> the listener on 40013
>> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
>> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
>> >>> >>  INFO mortbay.log: jetty-6.1.14
>> >>> >>  INFO mortbay.log: Extract
>> >>> >>
>> >>>
>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
>> :40013
>> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
>> >>> >>  INFO bsp.BSPMaster: hdfs://
>> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
>> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >>> >> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> >>> >> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>> >>> >> at
>> >>> >>
>> >>>
>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >>> >>
>> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
>> >>> just
>> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
>> >>> >> <property>
>> >>> >>                  <name>hama.zookeeper.quorum</name>
>> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
>> >>> >>                  <description>Comma separated list of servers in
>> the
>> >>> >> ZooKeeper quorum.
>> >>> >>                  For example, "host1.mydomain.com,
>> host2.mydomain.com,
>> >>> >> host3.mydomain.com".
>> >>> >>                  By default this is set to localhost for local and
>> >>> >> pseudo-distributed modes
>> >>> >>                  of operation. For a fully-distributed setup, this
>> >>> should
>> >>> >> be set to a full
>> >>> >>                  list of ZooKeeper quorum servers. If
>> HAMA_MANAGES_ZK
>> >>> is
>> >>> >> set in hama-env.sh
>> >>> >>                  this is the list of servers which we will
>> start/stop
>> >>> >> ZooKeeper on.
>> >>> >>                  </description>
>> >>> >>         </property>
>> >>> >>        ......
>> >>> >>        <property>
>> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
>> >>> >>                  <value>2181</value>
>> >>> >>          </property>
>> >>> >>
>> >>> >> Is something wrong with my settings ?
>> >>> >>
>> >>> >> Regards,
>> >>> >> Behroz Sikander
>> >>> >>
>> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
>> >>> edward.yoon@samsung.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
>> >>> >>> configurations
>> >>> >>>
>> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
>> Yarn
>> >>> >>> configuration is only needed when you want to submit a BSP job to
>> Yarn
>> >>> >>> cluster
>> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
>> >>> >>>
>> >>> >>> > distributed mode ? and is there any way to manage the server ? I
>> >>> mean
>> >>> >>> right
>> >>> >>> > now, I have 3 machines with alot of configurations files and log
>> >>> files.
>> >>> >>> It
>> >>> >>>
>> >>> >>> You can use web UI at
>> http://masterserver_address:40013/bspmaster.jsp
>> >>> >>>
>> >>> >>> To debug your program, please try like below:
>> >>> >>>
>> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
>> >>> >>> % bin/hama bspmaster
>> >>> >>> % bin/hama zookeeper
>> >>> >>>
>> >>> >>> 2) Run a Groom at server1 and server2.
>> >>> >>>
>> >>> >>> % bin/hama groom
>> >>> >>>
>> >>> >>> 3) Check whether deamons are running well. Then, run your program
>> >>> using jar
>> >>> >>> command at server1.
>> >>> >>>
>> >>> >>> % bin/hama jar .....
>> >>> >>>
>> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
>> >>> exception. But
>> >>> >>> > this occurs in both cases when I run my job with 3 tasks or
>> with 4
>> >>> tasks
>> >>> >>>
>> >>> >>> In fact, you should not see above initZK error log.
>> >>> >>>
>> >>> >>> --
>> >>> >>> Best Regards, Edward J. Yoon
>> >>> >>>
>> >>> >>>
>> >>> >>> -----Original Message-----
>> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
>> >>> >>> To: user@hama.apache.org
>> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
>> >>> >>>
>> >>> >>> I will try the things that you mentioned. I am not using the
>> latest
>> >>> version
>> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
>> >>> configurations
>> >>> >>> which makes it more harder for me to understand when things go
>> wrong.
>> >>> Any
>> >>> >>> suggestions ?
>> >>> >>>
>> >>> >>> Further, are there any tools that you use for debugging while in
>> >>> >>> distributed mode ? and is there any way to manage the server ? I
>> mean
>> >>> right
>> >>> >>> now, I have 3 machines with alot of configurations files and log
>> >>> files. It
>> >>> >>> takes alot of time. This makes me wonder how people who have 100s
>> of
>> >>> >>> machines debug and manage the cluster.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>> Behroz
>> >>> >>>
>> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
>> >>> edward.yoon@samsung.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>> > Hi,
>> >>> >>> >
>> >>> >>> > It looks like a zookeeper connection problem. Please check
>> whether
>> >>> >>> > zookeeper
>> >>> >>> > is running and every tasks can connect to zookeeper.
>> >>> >>> >
>> >>> >>> > I would recommend you to stop the firewall during debugging, and
>> >>> please
>> >>> >>> use
>> >>> >>> > the 0.7.0 latest release.
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > Best Regards, Edward J. Yoon
>> >>> >>> >
>> >>> >>> > -----Original Message-----
>> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
>> >>> >>> > To: user@hama.apache.org
>> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
>> >>> >>> >
>> >>> >>> > To figure out the issue, I was trying something else and found
>> out
>> >>> >>> another
>> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
>> >>> following
>> >>> >>> > lines give an exception.
>> >>> >>> >
>> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
>> >>> >>> >
>> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
>> >>> function.*
>> >>> >>> >
>> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
>> >>> >>> retrieved!*
>> >>> >>> >
>> >>> >>> > at
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>>
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>> >>> >>> >
>> >>> >>> > at
>> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>> >>> >>> >
>> >>> >>> > at
>> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>> >>> >>> >
>> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>> >>> >>> >
>> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>> >>> >>> >
>> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>> >>> >>> >
>> >>> >>> > at
>> >>> >>>
>> >>>
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>> >>> >>> >
>> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
>> >>> behroz89@gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >
>> >>> >>> > > I think I have more information on the issue. I did some
>> >>> debugging and
>> >>> >>> > > found something quite strange.
>> >>> >>> > >
>> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
>> and
>> >>> 3 task
>> >>> >>> > > will be opened on other MACHINE2),
>> >>> >>> > >
>> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
>> that
>> >>> the
>> >>> >>> > > processes do not even enter the SETUP function of BSP class. I
>> >>> have
>> >>> >>> print
>> >>> >>> > > statements in the setup function of BSP class and it doesn't
>> print
>> >>> >>> > > anything. I get empty files with zero size.
>> >>> >>> > >
>> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000000_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000000_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000001_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000001_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000002_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> >>> > > attempt_201506281624_0001_000002_0.log
>> >>> >>> > >
>> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
>> class and
>> >>> >>> prints
>> >>> >>> > > stuff. See the size of files generated on output. How is it
>> >>> possible
>> >>> >>> that
>> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>> >>> >>> > >
>> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000003_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000003_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000004_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000004_0.log
>> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000005_0.err
>> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> >>> > > attempt_201506281639_0001_000005_0.log
>> >>> >>> > >
>> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
>> >>> >>> > >
>> >>> >>> > > - Hama Groom log file on MACHINE2 shows
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
>> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
>> >>> >>> > >
>> >>> >>> > > Any clue what might be going wrong ?
>> >>> >>> > >
>> >>> >>> > > Regards,
>> >>> >>> > > Behroz
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > >
>> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
>> >>> behroz89@gmail.com>
>> >>> >>> > > wrote:
>> >>> >>> > >
>> >>> >>> > >> Here is the log file from that folder
>> >>> >>> > >>
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1
>> for
>> >>> port
>> >>> >>> > >> 61001
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
>> starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
>> BSPPeer
>> >>> >>> > >> address:b178b33b16cc port:61001
>> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
>> 61001:
>> >>> >>> > starting
>> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK
>> Sync
>> >>> Client
>> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
>> >>> connecting
>> >>> >>> to
>> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
>> listener
>> >>> on
>> >>> >>> 61001
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
>> Responder
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
>> 61001:
>> >>> >>> > exiting
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
>> right
>> >>> now.
>> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> >>> >>> > >> job_201506262331_0003
>> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
>> >>> number: 0
>> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
>> >>> number: 2
>> >>> >>> > >>
>> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>> >>> >>> edwardyoon@apache.org>
>> >>> >>> > >> wrote:
>> >>> >>> > >>
>> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
>> folder.
>> >>> >>> > >>>
>> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
>> >>> behroz89@gmail.com
>> >>> >>> >
>> >>> >>> > >>> wrote:
>> >>> >>> > >>> > Yea. I also thought that. I ran the program through
>> eclipse
>> >>> with 20
>> >>> >>> > >>> tasks
>> >>> >>> > >>> > and it works fine.
>> >>> >>> > >>> >
>> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>> >>> >>> > edwardyoon@apache.org
>> >>> >>> > >>> >
>> >>> >>> > >>> > wrote:
>> >>> >>> > >>> >
>> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
>> fine.
>> >>> When I
>> >>> >>> > >>> run my
>> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> >>> increase
>> >>> >>> > the
>> >>> >>> > >>> tasks
>> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> >>> understand
>> >>> >>> > >>> what
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> > go wrong.
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> It looks like a program bug. Have you ran your program in
>> >>> local
>> >>> >>> > mode?
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>> >>> >>> > behroz89@gmail.com>
>> >>> >>> > >>> >> wrote:
>> >>> >>> > >>> >> > Hi,
>> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
>> and 3
>> >>> are
>> >>> >>> > >>> resolved
>> >>> >>> > >>> >> but
>> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > My problem:
>> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
>> >>> properly
>> >>> >>> > >>> >> configured
>> >>> >>> > >>> >> > (Apparently). From my master machine when I start
>> Hadoop
>> >>> and
>> >>> >>> Hama,
>> >>> >>> > >>> I can
>> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
>> check
>> >>> the
>> >>> >>> > >>> maximum
>> >>> >>> > >>> >> tasks
>> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
>> each
>> >>> >>> > machine).
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
>> fine.
>> >>> When I
>> >>> >>> > >>> run my
>> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> >>> increase
>> >>> >>> > the
>> >>> >>> > >>> tasks
>> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> >>> understand
>> >>> >>> > >>> what
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> > go wrong.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
>> >>> sometimes
>> >>> >>> > get
>> >>> >>> > >>> an
>> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
>> >>> directory
>> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > Any help or clue would be great.
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > Regards,
>> >>> >>> > >>> >> > Behroz Sikander
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>> >>> >>> > >>> behroz89@gmail.com>
>> >>> >>> > >>> >> wrote:
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> >> Thank you :)
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>> >>> >>> > >>> edwardyoon@apache.org
>> >>> >>> > >>> >> >
>> >>> >>> > >>> >> >> wrote:
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >>> Hi,
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> You can get the maximum number of available tasks
>> like
>> >>> >>> following
>> >>> >>> > >>> code:
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> >>> >>> > >>> >> >>>     ClusterStatus cluster =
>> >>> jobClient.getClusterStatus(true);
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>     // Set to maximum
>> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>> >>> >>> > >>> behroz89@gmail.com>
>> >>> >>> > >>> >> >>> wrote:
>> >>> >>> > >>> >> >>> > Hi,
>> >>> >>> > >>> >> >>> > 1) Thank you for this.
>> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
>> files
>> >>> of PI
>> >>> >>> > >>> example
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> >
>> >>> >>>
>> >>>
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> >
>> >>> >>>
>> >>>
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
>> >>> submitted to
>> >>> >>> > the
>> >>> >>> > >>> job.
>> >>> >>> > >>> >> >>> During
>> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
>> >>> looking
>> >>> >>> > for
>> >>> >>> > >>> >> >>> something
>> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > Regards,
>> >>> >>> > >>> >> >>> > Behroz
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> >>> >>> > >>> >> edwardyoon@apache.org
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> > wrote:
>> >>> >>> > >>> >> >>> >
>> >>> >>> > >>> >> >>> >> Hello,
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
>> configuration
>> >>> >>> using
>> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of
>> course,
>> >>> the
>> >>> >>> > >>> fs.defaultFS
>> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>   <property>
>> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/
>> </value>
>> >>> >>> > >>> >> >>> >>     <description>
>> >>> >>> > >>> >> >>> >>       The name of the default file system. Either
>> the
>> >>> >>> literal
>> >>> >>> > >>> string
>> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>> >>> >>> > >>> >> >>> >>     </description>
>> >>> >>> > >>> >> >>> >>   </property>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks
>> per
>> >>> node.
>> >>> >>> > It
>> >>> >>> > >>> looks
>> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example
>> >>> and look
>> >>> >>> > at
>> >>> >>> > >>> the
>> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach
>> the
>> >>> images
>> >>> >>> to
>> >>> >>> > >>> >> mailing
>> >>> >>> > >>> >> >>> >> list so I can't see it.
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
>> method.
>> >>> If
>> >>> >>> input
>> >>> >>> > >>> is
>> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
>> driven
>> >>> by
>> >>> >>> the
>> >>> >>> > >>> number
>> >>> >>> > >>> >> of
>> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
>> >>> HAMA-956.
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> Thanks!
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> >>> >>> > >>> >> behroz89@gmail.com>
>> >>> >>> > >>> >> >>> >> wrote:
>> >>> >>> > >>> >> >>> >> > Hi,
>> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup
>> to a 2
>> >>> >>> > machine
>> >>> >>> > >>> >> setup.
>> >>> >>> > >>> >> >>> I was
>> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
>> HDFS
>> >>> to get
>> >>> >>> > >>> data. I
>> >>> >>> > >>> >> >>> have 3
>> >>> >>> > >>> >> >>> >> > trivial questions
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the
>> IP
>> >>> address
>> >>> >>> > of
>> >>> >>> > >>> >> server
>> >>> >>> > >>> >> >>> >> running
>> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically
>> pick
>> >>> from
>> >>> >>> the
>> >>> >>> > >>> >> >>> configurations
>> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
>> >>> wrong.
>> >>> >>> Right
>> >>> >>> > >>> now my
>> >>> >>> > >>> >> >>> code
>> >>> >>> > >>> >> >>> >> work
>> >>> >>> > >>> >> >>> >> > by using the following.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>> >>> >>> > >>> URI("hdfs://server_ip:port/"),
>> >>> >>> > >>> >> >>> conf);
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>> >>> >>> automatically
>> >>> >>> > >>> starts
>> >>> >>> > >>> >> >>> hama in
>> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
>> slave
>> >>> are
>> >>> >>> set
>> >>> >>> > >>> as
>> >>> >>> > >>> >> >>> >> groomservers.
>> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
>> which
>> >>> >>> means
>> >>> >>> > >>> that I
>> >>> >>> > >>> >> can
>> >>> >>> > >>> >> >>> >> open
>> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my
>> jar
>> >>> with
>> >>> >>> 3
>> >>> >>> > >>> bsp
>> >>> >>> > >>> >> tasks
>> >>> >>> > >>> >> >>> then
>> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
>> tasks,
>> >>> Hama
>> >>> >>> > >>> freezes.
>> >>> >>> > >>> >> >>> Here is
>> >>> >>> > >>> >> >>> >> the
>> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
>> slaves
>> >>> but
>> >>> >>> not
>> >>> >>> > >>> on
>> >>> >>> > >>> >> >>> master.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
>> >>> property in
>> >>> >>> > >>> >> >>> >> hama-default.xml
>> >>> >>> > >>> >> >>> >> > to 4 but still same result.
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many
>> BSPPeerChild
>> >>> >>> processes
>> >>> >>> > >>> as
>> >>> >>> > >>> >> >>> possible.
>> >>> >>> > >>> >> >>> >> Is
>> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that
>> ?
>> >>> Or hama
>> >>> >>> > >>> picks up
>> >>> >>> > >>> >> >>> the
>> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Regards,
>> >>> >>> > >>> >> >>> >> >
>> >>> >>> > >>> >> >>> >> > Behroz Sikander
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>> >> --
>> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >> >>> >>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>> --
>> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >> >>>
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >>
>> >>> >>> > >>> >> --
>> >>> >>> > >>> >> Best Regards, Edward J. Yoon
>> >>> >>> > >>> >>
>> >>> >>> > >>>
>> >>> >>> > >>>
>> >>> >>> > >>>
>> >>> >>> > >>> --
>> >>> >>> > >>> Best Regards, Edward J. Yoon
>> >>> >>> > >>>
>> >>> >>> > >>
>> >>> >>> > >>
>> >>> >>> > >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Best Regards, Edward J. Yoon
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Ok perfect. I do not have rights on /etc/hosts so that's why I was using
the IP addresses. I will talk to the administrator.

Btw I am wondering, how PI example was able to communicate with the other
servers. PI examples runs fine even if I have tasks more than 3 (works on
both machines).

On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> OKay almost done. I guess you need to add host names to your
> /etc/hosts file. :-) Please see also
>
> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
>
> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com>
> wrote:
> > Server 2 was showing the exception that I posted in the previous email.
> > Server1 is showing the following exception
> >
> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
> starting
> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added.
> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> > groomd_8d4b512cf448_50000
> > java.net.UnknownHostException: unknown host: 8d4b512cf448
> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> > at org.apache.hama.ipc.Client.call(Client.java:888)
> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
> >
> > I am looking into this issue.
> >
> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com>
> wrote:
> >
> >> Ok great. I was able to run the zk, groom and bspmaster on server 1. But
> >> when I ran the groom on server2 I got the following exception
> >>
> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
> >> establishing communication link with BSPMaster
> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
> >> reinitializing GroomServer: java.io.IOException: There is a problem in
> >> establishing communication link with BSPMaster.
> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <ed...@apache.org>
> >> wrote:
> >>
> >>> Here's my configurations:
> >>>
> >>> hama-site.xml:
> >>>
> >>>   <property>
> >>>     <name>bsp.master.address</name>
> >>>     <value>cluster-0:40000</value>
> >>>   </property>
> >>>
> >>>   <property>
> >>>     <name>fs.default.name</name>
> >>>     <value>hdfs://cluster-0:9000/</value>
> >>>   </property>
> >>>
> >>>   <property>
> >>>     <name>hama.zookeeper.quorum</name>
> >>>     <value>cluster-0</value>
> >>>   </property>
> >>>
> >>>
> >>> % bin/hama zookeeper
> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
> >>> configuration, only one server specified (ignoring)
> >>>
> >>> Then, open new terminal and run master with following command:
> >>>
> >>> % bin/hama bspmaster
> >>> ...
> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
> starting
> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
> starting
> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
> >>>
> >>>
> >>>
> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > If you run zk server too, BSPmaster will be connected to zk and won't
> >>> > throw exceptions.
> >>> >
> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
> behroz89@gmail.com>
> >>> wrote:
> >>> >> Hi,
> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
> the
> >>> same
> >>> >> problem.
> >>> >> When I run % bin/hama bspmaster, I am getting the following
> exception
> >>> >>
> >>> >> INFO http.HttpServer: Port returned by
> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
> >>> Opening
> >>> >> the listener on 40013
> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
> >>> >>  INFO mortbay.log: jetty-6.1.14
> >>> >>  INFO mortbay.log: Extract
> >>> >>
> >>>
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
> :40013
> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
> >>> >>  INFO bsp.BSPMaster: hdfs://
> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >>> >> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >>> >> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >>> >> at
> >>> >>
> >>>
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >>> >>
> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
> >>> just
> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
> >>> >> <property>
> >>> >>                  <name>hama.zookeeper.quorum</name>
> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
> >>> >>                  <description>Comma separated list of servers in the
> >>> >> ZooKeeper quorum.
> >>> >>                  For example, "host1.mydomain.com,
> host2.mydomain.com,
> >>> >> host3.mydomain.com".
> >>> >>                  By default this is set to localhost for local and
> >>> >> pseudo-distributed modes
> >>> >>                  of operation. For a fully-distributed setup, this
> >>> should
> >>> >> be set to a full
> >>> >>                  list of ZooKeeper quorum servers. If
> HAMA_MANAGES_ZK
> >>> is
> >>> >> set in hama-env.sh
> >>> >>                  this is the list of servers which we will
> start/stop
> >>> >> ZooKeeper on.
> >>> >>                  </description>
> >>> >>         </property>
> >>> >>        ......
> >>> >>        <property>
> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
> >>> >>                  <value>2181</value>
> >>> >>          </property>
> >>> >>
> >>> >> Is something wrong with my settings ?
> >>> >>
> >>> >> Regards,
> >>> >> Behroz Sikander
> >>> >>
> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
> >>> edward.yoon@samsung.com>
> >>> >> wrote:
> >>> >>
> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
> >>> >>> configurations
> >>> >>>
> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
> Yarn
> >>> >>> configuration is only needed when you want to submit a BSP job to
> Yarn
> >>> >>> cluster
> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
> >>> >>>
> >>> >>> > distributed mode ? and is there any way to manage the server ? I
> >>> mean
> >>> >>> right
> >>> >>> > now, I have 3 machines with alot of configurations files and log
> >>> files.
> >>> >>> It
> >>> >>>
> >>> >>> You can use web UI at
> http://masterserver_address:40013/bspmaster.jsp
> >>> >>>
> >>> >>> To debug your program, please try like below:
> >>> >>>
> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
> >>> >>> % bin/hama bspmaster
> >>> >>> % bin/hama zookeeper
> >>> >>>
> >>> >>> 2) Run a Groom at server1 and server2.
> >>> >>>
> >>> >>> % bin/hama groom
> >>> >>>
> >>> >>> 3) Check whether deamons are running well. Then, run your program
> >>> using jar
> >>> >>> command at server1.
> >>> >>>
> >>> >>> % bin/hama jar .....
> >>> >>>
> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
> >>> exception. But
> >>> >>> > this occurs in both cases when I run my job with 3 tasks or with
> 4
> >>> tasks
> >>> >>>
> >>> >>> In fact, you should not see above initZK error log.
> >>> >>>
> >>> >>> --
> >>> >>> Best Regards, Edward J. Yoon
> >>> >>>
> >>> >>>
> >>> >>> -----Original Message-----
> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
> >>> >>> To: user@hama.apache.org
> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
> >>> >>>
> >>> >>> I will try the things that you mentioned. I am not using the latest
> >>> version
> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
> >>> configurations
> >>> >>> which makes it more harder for me to understand when things go
> wrong.
> >>> Any
> >>> >>> suggestions ?
> >>> >>>
> >>> >>> Further, are there any tools that you use for debugging while in
> >>> >>> distributed mode ? and is there any way to manage the server ? I
> mean
> >>> right
> >>> >>> now, I have 3 machines with alot of configurations files and log
> >>> files. It
> >>> >>> takes alot of time. This makes me wonder how people who have 100s
> of
> >>> >>> machines debug and manage the cluster.
> >>> >>>
> >>> >>> Regards,
> >>> >>> Behroz
> >>> >>>
> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
> >>> edward.yoon@samsung.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>> > Hi,
> >>> >>> >
> >>> >>> > It looks like a zookeeper connection problem. Please check
> whether
> >>> >>> > zookeeper
> >>> >>> > is running and every tasks can connect to zookeeper.
> >>> >>> >
> >>> >>> > I would recommend you to stop the firewall during debugging, and
> >>> please
> >>> >>> use
> >>> >>> > the 0.7.0 latest release.
> >>> >>> >
> >>> >>> >
> >>> >>> > --
> >>> >>> > Best Regards, Edward J. Yoon
> >>> >>> >
> >>> >>> > -----Original Message-----
> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
> >>> >>> > To: user@hama.apache.org
> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
> >>> >>> >
> >>> >>> > To figure out the issue, I was trying something else and found
> out
> >>> >>> another
> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
> >>> following
> >>> >>> > lines give an exception.
> >>> >>> >
> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
> >>> >>> >
> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
> >>> >>> >
> >>> >>> >
> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
> >>> function.*
> >>> >>> >
> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
> >>> >>> retrieved!*
> >>> >>> >
> >>> >>> > at
> >>> >>> >
> >>> >>> >
> >>> >>>
> >>>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >>> >>> >
> >>> >>> > at
> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >>> >>> >
> >>> >>> > at
> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >>> >>> >
> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >>> >>> >
> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >>> >>> >
> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >>> >>> >
> >>> >>> > at
> >>> >>>
> >>>
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >>> >>> >
> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >>> > wrote:
> >>> >>> >
> >>> >>> > > I think I have more information on the issue. I did some
> >>> debugging and
> >>> >>> > > found something quite strange.
> >>> >>> > >
> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
> and
> >>> 3 task
> >>> >>> > > will be opened on other MACHINE2),
> >>> >>> > >
> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
> that
> >>> the
> >>> >>> > > processes do not even enter the SETUP function of BSP class. I
> >>> have
> >>> >>> print
> >>> >>> > > statements in the setup function of BSP class and it doesn't
> print
> >>> >>> > > anything. I get empty files with zero size.
> >>> >>> > >
> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000000_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000000_0.log
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000001_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000001_0.log
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000002_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> >>> > > attempt_201506281624_0001_000002_0.log
> >>> >>> > >
> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class
> and
> >>> >>> prints
> >>> >>> > > stuff. See the size of files generated on output. How is it
> >>> possible
> >>> >>> that
> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
> >>> >>> > >
> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000003_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000003_0.log
> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000004_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000004_0.log
> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000005_0.err
> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >>> >>> > > attempt_201506281639_0001_000005_0.log
> >>> >>> > >
> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
> >>> >>> > >
> >>> >>> > > - Hama Groom log file on MACHINE2 shows
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
> >>> >>> > >
> >>> >>> > > Any clue what might be going wrong ?
> >>> >>> > >
> >>> >>> > > Regards,
> >>> >>> > > Behroz
> >>> >>> > >
> >>> >>> > >
> >>> >>> > >
> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >>> > > wrote:
> >>> >>> > >
> >>> >>> > >> Here is the log file from that folder
> >>> >>> > >>
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1
> for
> >>> port
> >>> >>> > >> 61001
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
> starting
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> >>> >>> > >> address:b178b33b16cc port:61001
> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
> 61001:
> >>> >>> > starting
> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync
> >>> Client
> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
> >>> connecting
> >>> >>> to
> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
> 61001:
> >>> >>> > exiting
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> listener
> >>> on
> >>> >>> 61001
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
> 61001:
> >>> >>> > exiting
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
> 61001:
> >>> >>> > exiting
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> Responder
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
> 61001:
> >>> >>> > exiting
> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
> 61001:
> >>> >>> > exiting
> >>> >>> > >>
> >>> >>> > >>
> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
> right
> >>> now.
> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >>> >>> > >> job_201506262331_0003
> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
> >>> number: 0
> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
> >>> number: 2
> >>> >>> > >>
> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> >>> >>> edwardyoon@apache.org>
> >>> >>> > >> wrote:
> >>> >>> > >>
> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
> folder.
> >>> >>> > >>>
> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
> >>> behroz89@gmail.com
> >>> >>> >
> >>> >>> > >>> wrote:
> >>> >>> > >>> > Yea. I also thought that. I ran the program through eclipse
> >>> with 20
> >>> >>> > >>> tasks
> >>> >>> > >>> > and it works fine.
> >>> >>> > >>> >
> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> >>> >>> > edwardyoon@apache.org
> >>> >>> > >>> >
> >>> >>> > >>> > wrote:
> >>> >>> > >>> >
> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> fine.
> >>> When I
> >>> >>> > >>> run my
> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
> >>> increase
> >>> >>> > the
> >>> >>> > >>> tasks
> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> >>> >>> understand
> >>> >>> > >>> what
> >>> >>> > >>> >> can
> >>> >>> > >>> >> > go wrong.
> >>> >>> > >>> >>
> >>> >>> > >>> >> It looks like a program bug. Have you ran your program in
> >>> local
> >>> >>> > mode?
> >>> >>> > >>> >>
> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> >>> >>> > behroz89@gmail.com>
> >>> >>> > >>> >> wrote:
> >>> >>> > >>> >> > Hi,
> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
> and 3
> >>> are
> >>> >>> > >>> resolved
> >>> >>> > >>> >> but
> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > My problem:
> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
> >>> properly
> >>> >>> > >>> >> configured
> >>> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop
> >>> and
> >>> >>> Hama,
> >>> >>> > >>> I can
> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
> check
> >>> the
> >>> >>> > >>> maximum
> >>> >>> > >>> >> tasks
> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
> each
> >>> >>> > machine).
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> fine.
> >>> When I
> >>> >>> > >>> run my
> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
> >>> increase
> >>> >>> > the
> >>> >>> > >>> tasks
> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> >>> >>> understand
> >>> >>> > >>> what
> >>> >>> > >>> >> can
> >>> >>> > >>> >> > go wrong.
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
> >>> sometimes
> >>> >>> > get
> >>> >>> > >>> an
> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
> >>> directory
> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > Any help or clue would be great.
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > Regards,
> >>> >>> > >>> >> > Behroz Sikander
> >>> >>> > >>> >> >
> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >>> >>> > >>> behroz89@gmail.com>
> >>> >>> > >>> >> wrote:
> >>> >>> > >>> >> >
> >>> >>> > >>> >> >> Thank you :)
> >>> >>> > >>> >> >>
> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >>> >>> > >>> edwardyoon@apache.org
> >>> >>> > >>> >> >
> >>> >>> > >>> >> >> wrote:
> >>> >>> > >>> >> >>
> >>> >>> > >>> >> >>> Hi,
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>> You can get the maximum number of available tasks like
> >>> >>> following
> >>> >>> > >>> code:
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>> >>> > >>> >> >>>     ClusterStatus cluster =
> >>> jobClient.getClusterStatus(true);
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>>     // Set to maximum
> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >>> >>> > >>> behroz89@gmail.com>
> >>> >>> > >>> >> >>> wrote:
> >>> >>> > >>> >> >>> > Hi,
> >>> >>> > >>> >> >>> > 1) Thank you for this.
> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
> files
> >>> of PI
> >>> >>> > >>> example
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >>
> >>> >>> > >>>
> >>> >>> >
> >>> >>>
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >>
> >>> >>> > >>>
> >>> >>> >
> >>> >>>
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
> >>> submitted to
> >>> >>> > the
> >>> >>> > >>> job.
> >>> >>> > >>> >> >>> During
> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
> >>> looking
> >>> >>> > for
> >>> >>> > >>> >> >>> something
> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > Regards,
> >>> >>> > >>> >> >>> > Behroz
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> >>> >>> > >>> >> edwardyoon@apache.org
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> > wrote:
> >>> >>> > >>> >> >>> >
> >>> >>> > >>> >> >>> >> Hello,
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
> configuration
> >>> >>> using
> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course,
> >>> the
> >>> >>> > >>> fs.defaultFS
> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >>   <property>
> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> >>> > >>> >> >>> >>     <description>
> >>> >>> > >>> >> >>> >>       The name of the default file system. Either
> the
> >>> >>> literal
> >>> >>> > >>> string
> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
> >>> >>> > >>> >> >>> >>     </description>
> >>> >>> > >>> >> >>> >>   </property>
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks
> per
> >>> node.
> >>> >>> > It
> >>> >>> > >>> looks
> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example
> >>> and look
> >>> >>> > at
> >>> >>> > >>> the
> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the
> >>> images
> >>> >>> to
> >>> >>> > >>> >> mailing
> >>> >>> > >>> >> >>> >> list so I can't see it.
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
> method.
> >>> If
> >>> >>> input
> >>> >>> > >>> is
> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
> driven
> >>> by
> >>> >>> the
> >>> >>> > >>> number
> >>> >>> > >>> >> of
> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
> >>> HAMA-956.
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> Thanks!
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> >>> >>> > >>> >> behroz89@gmail.com>
> >>> >>> > >>> >> >>> >> wrote:
> >>> >>> > >>> >> >>> >> > Hi,
> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to
> a 2
> >>> >>> > machine
> >>> >>> > >>> >> setup.
> >>> >>> > >>> >> >>> I was
> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
> HDFS
> >>> to get
> >>> >>> > >>> data. I
> >>> >>> > >>> >> >>> have 3
> >>> >>> > >>> >> >>> >> > trivial questions
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP
> >>> address
> >>> >>> > of
> >>> >>> > >>> >> server
> >>> >>> > >>> >> >>> >> running
> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick
> >>> from
> >>> >>> the
> >>> >>> > >>> >> >>> configurations
> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
> >>> wrong.
> >>> >>> Right
> >>> >>> > >>> now my
> >>> >>> > >>> >> >>> code
> >>> >>> > >>> >> >>> >> work
> >>> >>> > >>> >> >>> >> > by using the following.
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >>> >>> > >>> URI("hdfs://server_ip:port/"),
> >>> >>> > >>> >> >>> conf);
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
> >>> >>> automatically
> >>> >>> > >>> starts
> >>> >>> > >>> >> >>> hama in
> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
> slave
> >>> are
> >>> >>> set
> >>> >>> > >>> as
> >>> >>> > >>> >> >>> >> groomservers.
> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
> which
> >>> >>> means
> >>> >>> > >>> that I
> >>> >>> > >>> >> can
> >>> >>> > >>> >> >>> >> open
> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my
> jar
> >>> with
> >>> >>> 3
> >>> >>> > >>> bsp
> >>> >>> > >>> >> tasks
> >>> >>> > >>> >> >>> then
> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
> tasks,
> >>> Hama
> >>> >>> > >>> freezes.
> >>> >>> > >>> >> >>> Here is
> >>> >>> > >>> >> >>> >> the
> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
> slaves
> >>> but
> >>> >>> not
> >>> >>> > >>> on
> >>> >>> > >>> >> >>> master.
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
> >>> property in
> >>> >>> > >>> >> >>> >> hama-default.xml
> >>> >>> > >>> >> >>> >> > to 4 but still same result.
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
> >>> >>> processes
> >>> >>> > >>> as
> >>> >>> > >>> >> >>> possible.
> >>> >>> > >>> >> >>> >> Is
> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ?
> >>> Or hama
> >>> >>> > >>> picks up
> >>> >>> > >>> >> >>> the
> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > Regards,
> >>> >>> > >>> >> >>> >> >
> >>> >>> > >>> >> >>> >> > Behroz Sikander
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>> >> --
> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
> >>> >>> > >>> >> >>> >>
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>> --
> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
> >>> >>> > >>> >> >>>
> >>> >>> > >>> >> >>
> >>> >>> > >>> >> >>
> >>> >>> > >>> >>
> >>> >>> > >>> >>
> >>> >>> > >>> >>
> >>> >>> > >>> >> --
> >>> >>> > >>> >> Best Regards, Edward J. Yoon
> >>> >>> > >>> >>
> >>> >>> > >>>
> >>> >>> > >>>
> >>> >>> > >>>
> >>> >>> > >>> --
> >>> >>> > >>> Best Regards, Edward J. Yoon
> >>> >>> > >>>
> >>> >>> > >>
> >>> >>> > >>
> >>> >>> > >
> >>> >>> >
> >>> >>> >
> >>> >>> >
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Best Regards, Edward J. Yoon
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
OKay almost done. I guess you need to add host names to your
/etc/hosts file. :-) Please see also
http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster

On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <be...@gmail.com> wrote:
> Server 2 was showing the exception that I posted in the previous email.
> Server1 is showing the following exception
>
> 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: starting
> 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added.
> 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> groomd_8d4b512cf448_50000
> java.net.UnknownHostException: unknown host: 8d4b512cf448
> at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> at org.apache.hama.ipc.Client.call(Client.java:888)
> at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
>
> I am looking into this issue.
>
> On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com> wrote:
>
>> Ok great. I was able to run the zk, groom and bspmaster on server 1. But
>> when I ran the groom on server2 I got the following exception
>>
>> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
>> establishing communication link with BSPMaster
>> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
>> reinitializing GroomServer: java.io.IOException: There is a problem in
>> establishing communication link with BSPMaster.
>> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
>> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>> Here's my configurations:
>>>
>>> hama-site.xml:
>>>
>>>   <property>
>>>     <name>bsp.master.address</name>
>>>     <value>cluster-0:40000</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.default.name</name>
>>>     <value>hdfs://cluster-0:9000/</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>hama.zookeeper.quorum</name>
>>>     <value>cluster-0</value>
>>>   </property>
>>>
>>>
>>> % bin/hama zookeeper
>>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
>>> configuration, only one server specified (ignoring)
>>>
>>> Then, open new terminal and run master with following command:
>>>
>>> % bin/hama bspmaster
>>> ...
>>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting
>>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>>>
>>>
>>>
>>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <ed...@apache.org>
>>> wrote:
>>> > Hi,
>>> >
>>> > If you run zk server too, BSPmaster will be connected to zk and won't
>>> > throw exceptions.
>>> >
>>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <be...@gmail.com>
>>> wrote:
>>> >> Hi,
>>> >> Thank you the information. I moved to hama 0.7.0 and I still have the
>>> same
>>> >> problem.
>>> >> When I run % bin/hama bspmaster, I am getting the following exception
>>> >>
>>> >> INFO http.HttpServer: Port returned by
>>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
>>> Opening
>>> >> the listener on 40013
>>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
>>> >>  INFO http.HttpServer: Jetty bound to port 40013
>>> >>  INFO mortbay.log: jetty-6.1.14
>>> >>  INFO mortbay.log: Extract
>>> >>
>>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
>>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
>>> >>  INFO bsp.BSPMaster: hdfs://
>>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
>>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>> >>  ERROR sync.ZKSyncBSPMasterClient:
>>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> >> KeeperErrorCode = ConnectionLoss for /bsp
>>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>> >> at
>>> >>
>>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>>> >>  ERROR sync.ZKSyncBSPMasterClient:
>>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> >> KeeperErrorCode = ConnectionLoss for /bsp
>>> >>
>>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
>>> just
>>> >> two servers 172.17.0.3 and 172.17.0.7)*
>>> >> <property>
>>> >>                  <name>hama.zookeeper.quorum</name>
>>> >>                  <value>172.17.0.3,172.17.0.7</value>
>>> >>                  <description>Comma separated list of servers in the
>>> >> ZooKeeper quorum.
>>> >>                  For example, "host1.mydomain.com,host2.mydomain.com,
>>> >> host3.mydomain.com".
>>> >>                  By default this is set to localhost for local and
>>> >> pseudo-distributed modes
>>> >>                  of operation. For a fully-distributed setup, this
>>> should
>>> >> be set to a full
>>> >>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK
>>> is
>>> >> set in hama-env.sh
>>> >>                  this is the list of servers which we will start/stop
>>> >> ZooKeeper on.
>>> >>                  </description>
>>> >>         </property>
>>> >>        ......
>>> >>        <property>
>>> >>                  <name>hama.zookeeper.property.clientPort</name>
>>> >>                  <value>2181</value>
>>> >>          </property>
>>> >>
>>> >> Is something wrong with my settings ?
>>> >>
>>> >> Regards,
>>> >> Behroz Sikander
>>> >>
>>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
>>> edward.yoon@samsung.com>
>>> >> wrote:
>>> >>
>>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
>>> >>> configurations
>>> >>>
>>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
>>> >>> configuration is only needed when you want to submit a BSP job to Yarn
>>> >>> cluster
>>> >>> without Hama cluster. So you don't need to worry about it. :-)
>>> >>>
>>> >>> > distributed mode ? and is there any way to manage the server ? I
>>> mean
>>> >>> right
>>> >>> > now, I have 3 machines with alot of configurations files and log
>>> files.
>>> >>> It
>>> >>>
>>> >>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>>> >>>
>>> >>> To debug your program, please try like below:
>>> >>>
>>> >>> 1) Run a BSPMaster and Zookeeper at server1.
>>> >>> % bin/hama bspmaster
>>> >>> % bin/hama zookeeper
>>> >>>
>>> >>> 2) Run a Groom at server1 and server2.
>>> >>>
>>> >>> % bin/hama groom
>>> >>>
>>> >>> 3) Check whether deamons are running well. Then, run your program
>>> using jar
>>> >>> command at server1.
>>> >>>
>>> >>> % bin/hama jar .....
>>> >>>
>>> >>> > In hama_[user]_bspmaster_.....log file I get the following
>>> exception. But
>>> >>> > this occurs in both cases when I run my job with 3 tasks or with 4
>>> tasks
>>> >>>
>>> >>> In fact, you should not see above initZK error log.
>>> >>>
>>> >>> --
>>> >>> Best Regards, Edward J. Yoon
>>> >>>
>>> >>>
>>> >>> -----Original Message-----
>>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> >>> Sent: Monday, June 29, 2015 8:18 AM
>>> >>> To: user@hama.apache.org
>>> >>> Subject: Re: Groomserer BSPPeerChild limit
>>> >>>
>>> >>> I will try the things that you mentioned. I am not using the latest
>>> version
>>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
>>> configurations
>>> >>> which makes it more harder for me to understand when things go wrong.
>>> Any
>>> >>> suggestions ?
>>> >>>
>>> >>> Further, are there any tools that you use for debugging while in
>>> >>> distributed mode ? and is there any way to manage the server ? I mean
>>> right
>>> >>> now, I have 3 machines with alot of configurations files and log
>>> files. It
>>> >>> takes alot of time. This makes me wonder how people who have 100s of
>>> >>> machines debug and manage the cluster.
>>> >>>
>>> >>> Regards,
>>> >>> Behroz
>>> >>>
>>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
>>> edward.yoon@samsung.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi,
>>> >>> >
>>> >>> > It looks like a zookeeper connection problem. Please check whether
>>> >>> > zookeeper
>>> >>> > is running and every tasks can connect to zookeeper.
>>> >>> >
>>> >>> > I would recommend you to stop the firewall during debugging, and
>>> please
>>> >>> use
>>> >>> > the 0.7.0 latest release.
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Best Regards, Edward J. Yoon
>>> >>> >
>>> >>> > -----Original Message-----
>>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> >>> > Sent: Monday, June 29, 2015 7:34 AM
>>> >>> > To: user@hama.apache.org
>>> >>> > Subject: Re: Groomserer BSPPeerChild limit
>>> >>> >
>>> >>> > To figure out the issue, I was trying something else and found out
>>> >>> another
>>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
>>> following
>>> >>> > lines give an exception.
>>> >>> >
>>> >>> > System.out.println( peer.getPeerName(0)); //Exception
>>> >>> >
>>> >>> > System.out.println( peer.getNumPeers()); //Exception
>>> >>> >
>>> >>> >
>>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
>>> function.*
>>> >>> >
>>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
>>> >>> retrieved!*
>>> >>> >
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>>
>>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>>> >>> >
>>> >>> > at
>>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>>> >>> >
>>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>> >>> >
>>> >>> > at
>>> >>>
>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>>> >>> >
>>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >>> > wrote:
>>> >>> >
>>> >>> > > I think I have more information on the issue. I did some
>>> debugging and
>>> >>> > > found something quite strange.
>>> >>> > >
>>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and
>>> 3 task
>>> >>> > > will be opened on other MACHINE2),
>>> >>> > >
>>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is that
>>> the
>>> >>> > > processes do not even enter the SETUP function of BSP class. I
>>> have
>>> >>> print
>>> >>> > > statements in the setup function of BSP class and it doesn't print
>>> >>> > > anything. I get empty files with zero size.
>>> >>> > >
>>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000000_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000000_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000001_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000001_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000002_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000002_0.log
>>> >>> > >
>>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class and
>>> >>> prints
>>> >>> > > stuff. See the size of files generated on output. How is it
>>> possible
>>> >>> that
>>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>>> >>> > >
>>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000003_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000003_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000004_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000004_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000005_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000005_0.log
>>> >>> > >
>>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
>>> >>> > >
>>> >>> > > - Hama Groom log file on MACHINE2 shows
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
>>> >>> > >
>>> >>> > > Any clue what might be going wrong ?
>>> >>> > >
>>> >>> > > Regards,
>>> >>> > > Behroz
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >>> > > wrote:
>>> >>> > >
>>> >>> > >> Here is the log file from that folder
>>> >>> > >>
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for
>>> port
>>> >>> > >> 61001
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>>> >>> > >> address:b178b33b16cc port:61001
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync
>>> Client
>>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
>>> connecting
>>> >>> to
>>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener
>>> on
>>> >>> 61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
>>> >>> > exiting
>>> >>> > >>
>>> >>> > >>
>>> >>> > >> And my console shows the following ouptut. Hama is frozen right
>>> now.
>>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>>> >>> > >> job_201506262331_0003
>>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
>>> number: 0
>>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
>>> number: 2
>>> >>> > >>
>>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org>
>>> >>> > >> wrote:
>>> >>> > >>
>>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>>> >>> > >>>
>>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
>>> behroz89@gmail.com
>>> >>> >
>>> >>> > >>> wrote:
>>> >>> > >>> > Yea. I also thought that. I ran the program through eclipse
>>> with 20
>>> >>> > >>> tasks
>>> >>> > >>> > and it works fine.
>>> >>> > >>> >
>>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>>> >>> > edwardyoon@apache.org
>>> >>> > >>> >
>>> >>> > >>> > wrote:
>>> >>> > >>> >
>>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
>>> When I
>>> >>> > >>> run my
>>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>>> increase
>>> >>> > the
>>> >>> > >>> tasks
>>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>>> >>> understand
>>> >>> > >>> what
>>> >>> > >>> >> can
>>> >>> > >>> >> > go wrong.
>>> >>> > >>> >>
>>> >>> > >>> >> It looks like a program bug. Have you ran your program in
>>> local
>>> >>> > mode?
>>> >>> > >>> >>
>>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>>> >>> > behroz89@gmail.com>
>>> >>> > >>> >> wrote:
>>> >>> > >>> >> > Hi,
>>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3
>>> are
>>> >>> > >>> resolved
>>> >>> > >>> >> but
>>> >>> > >>> >> > issue number 2 is still giving me headaches.
>>> >>> > >>> >> >
>>> >>> > >>> >> > My problem:
>>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
>>> properly
>>> >>> > >>> >> configured
>>> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop
>>> and
>>> >>> Hama,
>>> >>> > >>> I can
>>> >>> > >>> >> > see the processes started on other 2 machines. If I check
>>> the
>>> >>> > >>> maximum
>>> >>> > >>> >> tasks
>>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
>>> >>> > machine).
>>> >>> > >>> >> >
>>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
>>> When I
>>> >>> > >>> run my
>>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>>> increase
>>> >>> > the
>>> >>> > >>> tasks
>>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>>> >>> understand
>>> >>> > >>> what
>>> >>> > >>> >> can
>>> >>> > >>> >> > go wrong.
>>> >>> > >>> >> >
>>> >>> > >>> >> > I checked the logs files and things look fine. I just
>>> sometimes
>>> >>> > get
>>> >>> > >>> an
>>> >>> > >>> >> > exception that hama was not able to delete the sytem
>>> directory
>>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>>> >>> > >>> >> >
>>> >>> > >>> >> > Any help or clue would be great.
>>> >>> > >>> >> >
>>> >>> > >>> >> > Regards,
>>> >>> > >>> >> > Behroz Sikander
>>> >>> > >>> >> >
>>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>>> >>> > >>> behroz89@gmail.com>
>>> >>> > >>> >> wrote:
>>> >>> > >>> >> >
>>> >>> > >>> >> >> Thank you :)
>>> >>> > >>> >> >>
>>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>>> >>> > >>> edwardyoon@apache.org
>>> >>> > >>> >> >
>>> >>> > >>> >> >> wrote:
>>> >>> > >>> >> >>
>>> >>> > >>> >> >>> Hi,
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> You can get the maximum number of available tasks like
>>> >>> following
>>> >>> > >>> code:
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>> >>> > >>> >> >>>     ClusterStatus cluster =
>>> jobClient.getClusterStatus(true);
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>     // Set to maximum
>>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>>> >>> > >>> behroz89@gmail.com>
>>> >>> > >>> >> >>> wrote:
>>> >>> > >>> >> >>> > Hi,
>>> >>> > >>> >> >>> > 1) Thank you for this.
>>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log files
>>> of PI
>>> >>> > >>> example
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > *Result of JPS command on slave*
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>>
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> >
>>> >>>
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > *Result of JPS command on Master*
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>>
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> >
>>> >>>
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
>>> submitted to
>>> >>> > the
>>> >>> > >>> job.
>>> >>> > >>> >> >>> During
>>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
>>> looking
>>> >>> > for
>>> >>> > >>> >> >>> something
>>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > Regards,
>>> >>> > >>> >> >>> > Behroz
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>>> >>> > >>> >> edwardyoon@apache.org
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > wrote:
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >> Hello,
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
>>> >>> using
>>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course,
>>> the
>>> >>> > >>> fs.defaultFS
>>> >>> > >>> >> >>> >> property should be in hama-site.xml
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>   <property>
>>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >>> > >>> >> >>> >>     <description>
>>> >>> > >>> >> >>> >>       The name of the default file system. Either the
>>> >>> literal
>>> >>> > >>> string
>>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>>> >>> > >>> >> >>> >>     </description>
>>> >>> > >>> >> >>> >>   </property>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per
>>> node.
>>> >>> > It
>>> >>> > >>> looks
>>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example
>>> and look
>>> >>> > at
>>> >>> > >>> the
>>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the
>>> images
>>> >>> to
>>> >>> > >>> >> mailing
>>> >>> > >>> >> >>> >> list so I can't see it.
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method.
>>> If
>>> >>> input
>>> >>> > >>> is
>>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven
>>> by
>>> >>> the
>>> >>> > >>> number
>>> >>> > >>> >> of
>>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
>>> HAMA-956.
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> Thanks!
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>>> >>> > >>> >> behroz89@gmail.com>
>>> >>> > >>> >> >>> >> wrote:
>>> >>> > >>> >> >>> >> > Hi,
>>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
>>> >>> > machine
>>> >>> > >>> >> setup.
>>> >>> > >>> >> >>> I was
>>> >>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS
>>> to get
>>> >>> > >>> data. I
>>> >>> > >>> >> >>> have 3
>>> >>> > >>> >> >>> >> > trivial questions
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP
>>> address
>>> >>> > of
>>> >>> > >>> >> server
>>> >>> > >>> >> >>> >> running
>>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick
>>> from
>>> >>> the
>>> >>> > >>> >> >>> configurations
>>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
>>> wrong.
>>> >>> Right
>>> >>> > >>> now my
>>> >>> > >>> >> >>> code
>>> >>> > >>> >> >>> >> work
>>> >>> > >>> >> >>> >> > by using the following.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>>> >>> > >>> URI("hdfs://server_ip:port/"),
>>> >>> > >>> >> >>> conf);
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>>> >>> automatically
>>> >>> > >>> starts
>>> >>> > >>> >> >>> hama in
>>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and slave
>>> are
>>> >>> set
>>> >>> > >>> as
>>> >>> > >>> >> >>> >> groomservers.
>>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job which
>>> >>> means
>>> >>> > >>> that I
>>> >>> > >>> >> can
>>> >>> > >>> >> >>> >> open
>>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar
>>> with
>>> >>> 3
>>> >>> > >>> bsp
>>> >>> > >>> >> tasks
>>> >>> > >>> >> >>> then
>>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks,
>>> Hama
>>> >>> > >>> freezes.
>>> >>> > >>> >> >>> Here is
>>> >>> > >>> >> >>> >> the
>>> >>> > >>> >> >>> >> > result of JPS command on slave.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Result of JPS command on Master
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves
>>> but
>>> >>> not
>>> >>> > >>> on
>>> >>> > >>> >> >>> master.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
>>> property in
>>> >>> > >>> >> >>> >> hama-default.xml
>>> >>> > >>> >> >>> >> > to 4 but still same result.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
>>> >>> processes
>>> >>> > >>> as
>>> >>> > >>> >> >>> possible.
>>> >>> > >>> >> >>> >> Is
>>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ?
>>> Or hama
>>> >>> > >>> picks up
>>> >>> > >>> >> >>> the
>>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Regards,
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Behroz Sikander
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> --
>>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> --
>>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>
>>> >>> > >>> >> >>
>>> >>> > >>> >>
>>> >>> > >>> >>
>>> >>> > >>> >>
>>> >>> > >>> >> --
>>> >>> > >>> >> Best Regards, Edward J. Yoon
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> --
>>> >>> > >>> Best Regards, Edward J. Yoon
>>> >>> > >>>
>>> >>> > >>
>>> >>> > >>
>>> >>> > >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Server 2 was showing the exception that I posted in the previous email.
Server1 is showing the following exception

15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: starting
15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added.
15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
groomd_8d4b512cf448_50000
java.net.UnknownHostException: unknown host: 8d4b512cf448
at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
at org.apache.hama.ipc.Client.call(Client.java:888)
at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)

I am looking into this issue.

On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <be...@gmail.com> wrote:

> Ok great. I was able to run the zk, groom and bspmaster on server 1. But
> when I ran the groom on server2 I got the following exception
>
> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
> establishing communication link with BSPMaster
> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
> reinitializing GroomServer: java.io.IOException: There is a problem in
> establishing communication link with BSPMaster.
> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
> at java.lang.Thread.run(Thread.java:745)
>
> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> Here's my configurations:
>>
>> hama-site.xml:
>>
>>   <property>
>>     <name>bsp.master.address</name>
>>     <value>cluster-0:40000</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.default.name</name>
>>     <value>hdfs://cluster-0:9000/</value>
>>   </property>
>>
>>   <property>
>>     <name>hama.zookeeper.quorum</name>
>>     <value>cluster-0</value>
>>   </property>
>>
>>
>> % bin/hama zookeeper
>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
>> configuration, only one server specified (ignoring)
>>
>> Then, open new terminal and run master with following command:
>>
>> % bin/hama bspmaster
>> ...
>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting
>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting
>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>>
>>
>>
>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>> > Hi,
>> >
>> > If you run zk server too, BSPmaster will be connected to zk and won't
>> > throw exceptions.
>> >
>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> >> Hi,
>> >> Thank you the information. I moved to hama 0.7.0 and I still have the
>> same
>> >> problem.
>> >> When I run % bin/hama bspmaster, I am getting the following exception
>> >>
>> >> INFO http.HttpServer: Port returned by
>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
>> Opening
>> >> the listener on 40013
>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
>> >>  INFO http.HttpServer: Jetty bound to port 40013
>> >>  INFO mortbay.log: jetty-6.1.14
>> >>  INFO mortbay.log: Extract
>> >>
>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
>> >>  INFO bsp.BSPMaster: hdfs://
>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>> >> at
>> >>
>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>> >>  ERROR sync.ZKSyncBSPMasterClient:
>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >> KeeperErrorCode = ConnectionLoss for /bsp
>> >>
>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
>> just
>> >> two servers 172.17.0.3 and 172.17.0.7)*
>> >> <property>
>> >>                  <name>hama.zookeeper.quorum</name>
>> >>                  <value>172.17.0.3,172.17.0.7</value>
>> >>                  <description>Comma separated list of servers in the
>> >> ZooKeeper quorum.
>> >>                  For example, "host1.mydomain.com,host2.mydomain.com,
>> >> host3.mydomain.com".
>> >>                  By default this is set to localhost for local and
>> >> pseudo-distributed modes
>> >>                  of operation. For a fully-distributed setup, this
>> should
>> >> be set to a full
>> >>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK
>> is
>> >> set in hama-env.sh
>> >>                  this is the list of servers which we will start/stop
>> >> ZooKeeper on.
>> >>                  </description>
>> >>         </property>
>> >>        ......
>> >>        <property>
>> >>                  <name>hama.zookeeper.property.clientPort</name>
>> >>                  <value>2181</value>
>> >>          </property>
>> >>
>> >> Is something wrong with my settings ?
>> >>
>> >> Regards,
>> >> Behroz Sikander
>> >>
>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
>> edward.yoon@samsung.com>
>> >> wrote:
>> >>
>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
>> >>> configurations
>> >>>
>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
>> >>> configuration is only needed when you want to submit a BSP job to Yarn
>> >>> cluster
>> >>> without Hama cluster. So you don't need to worry about it. :-)
>> >>>
>> >>> > distributed mode ? and is there any way to manage the server ? I
>> mean
>> >>> right
>> >>> > now, I have 3 machines with alot of configurations files and log
>> files.
>> >>> It
>> >>>
>> >>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>> >>>
>> >>> To debug your program, please try like below:
>> >>>
>> >>> 1) Run a BSPMaster and Zookeeper at server1.
>> >>> % bin/hama bspmaster
>> >>> % bin/hama zookeeper
>> >>>
>> >>> 2) Run a Groom at server1 and server2.
>> >>>
>> >>> % bin/hama groom
>> >>>
>> >>> 3) Check whether deamons are running well. Then, run your program
>> using jar
>> >>> command at server1.
>> >>>
>> >>> % bin/hama jar .....
>> >>>
>> >>> > In hama_[user]_bspmaster_.....log file I get the following
>> exception. But
>> >>> > this occurs in both cases when I run my job with 3 tasks or with 4
>> tasks
>> >>>
>> >>> In fact, you should not see above initZK error log.
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> Sent: Monday, June 29, 2015 8:18 AM
>> >>> To: user@hama.apache.org
>> >>> Subject: Re: Groomserer BSPPeerChild limit
>> >>>
>> >>> I will try the things that you mentioned. I am not using the latest
>> version
>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
>> configurations
>> >>> which makes it more harder for me to understand when things go wrong.
>> Any
>> >>> suggestions ?
>> >>>
>> >>> Further, are there any tools that you use for debugging while in
>> >>> distributed mode ? and is there any way to manage the server ? I mean
>> right
>> >>> now, I have 3 machines with alot of configurations files and log
>> files. It
>> >>> takes alot of time. This makes me wonder how people who have 100s of
>> >>> machines debug and manage the cluster.
>> >>>
>> >>> Regards,
>> >>> Behroz
>> >>>
>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
>> edward.yoon@samsung.com>
>> >>> wrote:
>> >>>
>> >>> > Hi,
>> >>> >
>> >>> > It looks like a zookeeper connection problem. Please check whether
>> >>> > zookeeper
>> >>> > is running and every tasks can connect to zookeeper.
>> >>> >
>> >>> > I would recommend you to stop the firewall during debugging, and
>> please
>> >>> use
>> >>> > the 0.7.0 latest release.
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Best Regards, Edward J. Yoon
>> >>> >
>> >>> > -----Original Message-----
>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>> >>> > Sent: Monday, June 29, 2015 7:34 AM
>> >>> > To: user@hama.apache.org
>> >>> > Subject: Re: Groomserer BSPPeerChild limit
>> >>> >
>> >>> > To figure out the issue, I was trying something else and found out
>> >>> another
>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
>> following
>> >>> > lines give an exception.
>> >>> >
>> >>> > System.out.println( peer.getPeerName(0)); //Exception
>> >>> >
>> >>> > System.out.println( peer.getNumPeers()); //Exception
>> >>> >
>> >>> >
>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
>> function.*
>> >>> >
>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
>> >>> retrieved!*
>> >>> >
>> >>> > at
>> >>> >
>> >>> >
>> >>>
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>> >>> >
>> >>> > at
>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>> >>> >
>> >>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>> >>> >
>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>> >>> >
>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>> >>> >
>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>> >>> >
>> >>> > at
>> >>>
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>> >>> >
>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>> > wrote:
>> >>> >
>> >>> > > I think I have more information on the issue. I did some
>> debugging and
>> >>> > > found something quite strange.
>> >>> > >
>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and
>> 3 task
>> >>> > > will be opened on other MACHINE2),
>> >>> > >
>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is that
>> the
>> >>> > > processes do not even enter the SETUP function of BSP class. I
>> have
>> >>> print
>> >>> > > statements in the setup function of BSP class and it doesn't print
>> >>> > > anything. I get empty files with zero size.
>> >>> > >
>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000000_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000000_0.log
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000001_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000001_0.log
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000002_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> >>> > > attempt_201506281624_0001_000002_0.log
>> >>> > >
>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class and
>> >>> prints
>> >>> > > stuff. See the size of files generated on output. How is it
>> possible
>> >>> that
>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>> >>> > >
>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000003_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000003_0.log
>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000004_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000004_0.log
>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000005_0.err
>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> >>> > > attempt_201506281639_0001_000005_0.log
>> >>> > >
>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
>> >>> > >
>> >>> > > - Hama Groom log file on MACHINE2 shows
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
>> >>> > >
>> >>> > > Any clue what might be going wrong ?
>> >>> > >
>> >>> > > Regards,
>> >>> > > Behroz
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>> > > wrote:
>> >>> > >
>> >>> > >> Here is the log file from that folder
>> >>> > >>
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for
>> port
>> >>> > >> 61001
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>> >>> > >> address:b178b33b16cc port:61001
>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
>> >>> > starting
>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync
>> Client
>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
>> connecting
>> >>> to
>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
>> >>> > exiting
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener
>> on
>> >>> 61001
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
>> >>> > exiting
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
>> >>> > exiting
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
>> >>> > exiting
>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
>> >>> > exiting
>> >>> > >>
>> >>> > >>
>> >>> > >> And my console shows the following ouptut. Hama is frozen right
>> now.
>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> >>> > >> job_201506262331_0003
>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
>> number: 0
>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
>> number: 2
>> >>> > >>
>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>> >>> edwardyoon@apache.org>
>> >>> > >> wrote:
>> >>> > >>
>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>> >>> > >>>
>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
>> behroz89@gmail.com
>> >>> >
>> >>> > >>> wrote:
>> >>> > >>> > Yea. I also thought that. I ran the program through eclipse
>> with 20
>> >>> > >>> tasks
>> >>> > >>> > and it works fine.
>> >>> > >>> >
>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>> >>> > edwardyoon@apache.org
>> >>> > >>> >
>> >>> > >>> > wrote:
>> >>> > >>> >
>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
>> When I
>> >>> > >>> run my
>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> increase
>> >>> > the
>> >>> > >>> tasks
>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> understand
>> >>> > >>> what
>> >>> > >>> >> can
>> >>> > >>> >> > go wrong.
>> >>> > >>> >>
>> >>> > >>> >> It looks like a program bug. Have you ran your program in
>> local
>> >>> > mode?
>> >>> > >>> >>
>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>> >>> > behroz89@gmail.com>
>> >>> > >>> >> wrote:
>> >>> > >>> >> > Hi,
>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3
>> are
>> >>> > >>> resolved
>> >>> > >>> >> but
>> >>> > >>> >> > issue number 2 is still giving me headaches.
>> >>> > >>> >> >
>> >>> > >>> >> > My problem:
>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
>> properly
>> >>> > >>> >> configured
>> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop
>> and
>> >>> Hama,
>> >>> > >>> I can
>> >>> > >>> >> > see the processes started on other 2 machines. If I check
>> the
>> >>> > >>> maximum
>> >>> > >>> >> tasks
>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
>> >>> > machine).
>> >>> > >>> >> >
>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
>> When I
>> >>> > >>> run my
>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
>> increase
>> >>> > the
>> >>> > >>> tasks
>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> >>> understand
>> >>> > >>> what
>> >>> > >>> >> can
>> >>> > >>> >> > go wrong.
>> >>> > >>> >> >
>> >>> > >>> >> > I checked the logs files and things look fine. I just
>> sometimes
>> >>> > get
>> >>> > >>> an
>> >>> > >>> >> > exception that hama was not able to delete the sytem
>> directory
>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>> >>> > >>> >> >
>> >>> > >>> >> > Any help or clue would be great.
>> >>> > >>> >> >
>> >>> > >>> >> > Regards,
>> >>> > >>> >> > Behroz Sikander
>> >>> > >>> >> >
>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>> >>> > >>> behroz89@gmail.com>
>> >>> > >>> >> wrote:
>> >>> > >>> >> >
>> >>> > >>> >> >> Thank you :)
>> >>> > >>> >> >>
>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>> >>> > >>> edwardyoon@apache.org
>> >>> > >>> >> >
>> >>> > >>> >> >> wrote:
>> >>> > >>> >> >>
>> >>> > >>> >> >>> Hi,
>> >>> > >>> >> >>>
>> >>> > >>> >> >>> You can get the maximum number of available tasks like
>> >>> following
>> >>> > >>> code:
>> >>> > >>> >> >>>
>> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> >>> > >>> >> >>>     ClusterStatus cluster =
>> jobClient.getClusterStatus(true);
>> >>> > >>> >> >>>
>> >>> > >>> >> >>>     // Set to maximum
>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> >>> > >>> >> >>>
>> >>> > >>> >> >>>
>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>> >>> > >>> behroz89@gmail.com>
>> >>> > >>> >> >>> wrote:
>> >>> > >>> >> >>> > Hi,
>> >>> > >>> >> >>> > 1) Thank you for this.
>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log files
>> of PI
>> >>> > >>> example
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > *Result of JPS command on slave*
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>>
>> >>> > >>> >>
>> >>> > >>>
>> >>> >
>> >>>
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > *Result of JPS command on Master*
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>>
>> >>> > >>> >>
>> >>> > >>>
>> >>> >
>> >>>
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
>> submitted to
>> >>> > the
>> >>> > >>> job.
>> >>> > >>> >> >>> During
>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
>> looking
>> >>> > for
>> >>> > >>> >> >>> something
>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > Regards,
>> >>> > >>> >> >>> > Behroz
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> >>> > >>> >> edwardyoon@apache.org
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> > wrote:
>> >>> > >>> >> >>> >
>> >>> > >>> >> >>> >> Hello,
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
>> >>> using
>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course,
>> the
>> >>> > >>> fs.defaultFS
>> >>> > >>> >> >>> >> property should be in hama-site.xml
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >>   <property>
>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>> >>> > >>> >> >>> >>     <description>
>> >>> > >>> >> >>> >>       The name of the default file system. Either the
>> >>> literal
>> >>> > >>> string
>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>> >>> > >>> >> >>> >>     </description>
>> >>> > >>> >> >>> >>   </property>
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per
>> node.
>> >>> > It
>> >>> > >>> looks
>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example
>> and look
>> >>> > at
>> >>> > >>> the
>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the
>> images
>> >>> to
>> >>> > >>> >> mailing
>> >>> > >>> >> >>> >> list so I can't see it.
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method.
>> If
>> >>> input
>> >>> > >>> is
>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven
>> by
>> >>> the
>> >>> > >>> number
>> >>> > >>> >> of
>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
>> HAMA-956.
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> Thanks!
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> >>> > >>> >> behroz89@gmail.com>
>> >>> > >>> >> >>> >> wrote:
>> >>> > >>> >> >>> >> > Hi,
>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
>> >>> > machine
>> >>> > >>> >> setup.
>> >>> > >>> >> >>> I was
>> >>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS
>> to get
>> >>> > >>> data. I
>> >>> > >>> >> >>> have 3
>> >>> > >>> >> >>> >> > trivial questions
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP
>> address
>> >>> > of
>> >>> > >>> >> server
>> >>> > >>> >> >>> >> running
>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick
>> from
>> >>> the
>> >>> > >>> >> >>> configurations
>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
>> wrong.
>> >>> Right
>> >>> > >>> now my
>> >>> > >>> >> >>> code
>> >>> > >>> >> >>> >> work
>> >>> > >>> >> >>> >> > by using the following.
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>> >>> > >>> URI("hdfs://server_ip:port/"),
>> >>> > >>> >> >>> conf);
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>> >>> automatically
>> >>> > >>> starts
>> >>> > >>> >> >>> hama in
>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and slave
>> are
>> >>> set
>> >>> > >>> as
>> >>> > >>> >> >>> >> groomservers.
>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job which
>> >>> means
>> >>> > >>> that I
>> >>> > >>> >> can
>> >>> > >>> >> >>> >> open
>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar
>> with
>> >>> 3
>> >>> > >>> bsp
>> >>> > >>> >> tasks
>> >>> > >>> >> >>> then
>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks,
>> Hama
>> >>> > >>> freezes.
>> >>> > >>> >> >>> Here is
>> >>> > >>> >> >>> >> the
>> >>> > >>> >> >>> >> > result of JPS command on slave.
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > Result of JPS command on Master
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves
>> but
>> >>> not
>> >>> > >>> on
>> >>> > >>> >> >>> master.
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
>> property in
>> >>> > >>> >> >>> >> hama-default.xml
>> >>> > >>> >> >>> >> > to 4 but still same result.
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
>> >>> processes
>> >>> > >>> as
>> >>> > >>> >> >>> possible.
>> >>> > >>> >> >>> >> Is
>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ?
>> Or hama
>> >>> > >>> picks up
>> >>> > >>> >> >>> the
>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > Regards,
>> >>> > >>> >> >>> >> >
>> >>> > >>> >> >>> >> > Behroz Sikander
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>> >> --
>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>> >>> > >>> >> >>> >>
>> >>> > >>> >> >>>
>> >>> > >>> >> >>>
>> >>> > >>> >> >>>
>> >>> > >>> >> >>> --
>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
>> >>> > >>> >> >>>
>> >>> > >>> >> >>
>> >>> > >>> >> >>
>> >>> > >>> >>
>> >>> > >>> >>
>> >>> > >>> >>
>> >>> > >>> >> --
>> >>> > >>> >> Best Regards, Edward J. Yoon
>> >>> > >>> >>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>>
>> >>> > >>> --
>> >>> > >>> Best Regards, Edward J. Yoon
>> >>> > >>>
>> >>> > >>
>> >>> > >>
>> >>> > >
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Ok great. I was able to run the zk, groom and bspmaster on server 1. But
when I ran the groom on server2 I got the following exception

15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in establishing
communication link with BSPMaster
15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
reinitializing GroomServer: java.io.IOException: There is a problem in
establishing communication link with BSPMaster.
at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
at java.lang.Thread.run(Thread.java:745)

On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Here's my configurations:
>
> hama-site.xml:
>
>   <property>
>     <name>bsp.master.address</name>
>     <value>cluster-0:40000</value>
>   </property>
>
>   <property>
>     <name>fs.default.name</name>
>     <value>hdfs://cluster-0:9000/</value>
>   </property>
>
>   <property>
>     <name>hama.zookeeper.quorum</name>
>     <value>cluster-0</value>
>   </property>
>
>
> % bin/hama zookeeper
> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
> configuration, only one server specified (ignoring)
>
> Then, open new terminal and run master with following command:
>
> % bin/hama bspmaster
> ...
> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting
> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting
> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>
>
>
> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
> > Hi,
> >
> > If you run zk server too, BSPmaster will be connected to zk and won't
> > throw exceptions.
> >
> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <be...@gmail.com>
> wrote:
> >> Hi,
> >> Thank you the information. I moved to hama 0.7.0 and I still have the
> same
> >> problem.
> >> When I run % bin/hama bspmaster, I am getting the following exception
> >>
> >> INFO http.HttpServer: Port returned by
> >> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> >> the listener on 40013
> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> >> webServer.getConnectors()[0].getLocalPort() returned 40013
> >>  INFO http.HttpServer: Jetty bound to port 40013
> >>  INFO mortbay.log: jetty-6.1.14
> >>  INFO mortbay.log: Extract
> >>
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
> >>  INFO bsp.BSPMaster: Cleaning up the system directory
> >>  INFO bsp.BSPMaster: hdfs://172.17.0.3:54310/tmp/hama-behroz/bsp/system
> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >> at
> >>
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> KeeperErrorCode = ConnectionLoss for /bsp
> >>
> >> *Why zookeeper settings in hama-site.xml are (right now, I am using just
> >> two servers 172.17.0.3 and 172.17.0.7)*
> >> <property>
> >>                  <name>hama.zookeeper.quorum</name>
> >>                  <value>172.17.0.3,172.17.0.7</value>
> >>                  <description>Comma separated list of servers in the
> >> ZooKeeper quorum.
> >>                  For example, "host1.mydomain.com,host2.mydomain.com,
> >> host3.mydomain.com".
> >>                  By default this is set to localhost for local and
> >> pseudo-distributed modes
> >>                  of operation. For a fully-distributed setup, this
> should
> >> be set to a full
> >>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is
> >> set in hama-env.sh
> >>                  this is the list of servers which we will start/stop
> >> ZooKeeper on.
> >>                  </description>
> >>         </property>
> >>        ......
> >>        <property>
> >>                  <name>hama.zookeeper.property.clientPort</name>
> >>                  <value>2181</value>
> >>          </property>
> >>
> >> Is something wrong with my settings ?
> >>
> >> Regards,
> >> Behroz Sikander
> >>
> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
> edward.yoon@samsung.com>
> >> wrote:
> >>
> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
> >>> configurations
> >>>
> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
> >>> configuration is only needed when you want to submit a BSP job to Yarn
> >>> cluster
> >>> without Hama cluster. So you don't need to worry about it. :-)
> >>>
> >>> > distributed mode ? and is there any way to manage the server ? I mean
> >>> right
> >>> > now, I have 3 machines with alot of configurations files and log
> files.
> >>> It
> >>>
> >>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
> >>>
> >>> To debug your program, please try like below:
> >>>
> >>> 1) Run a BSPMaster and Zookeeper at server1.
> >>> % bin/hama bspmaster
> >>> % bin/hama zookeeper
> >>>
> >>> 2) Run a Groom at server1 and server2.
> >>>
> >>> % bin/hama groom
> >>>
> >>> 3) Check whether deamons are running well. Then, run your program
> using jar
> >>> command at server1.
> >>>
> >>> % bin/hama jar .....
> >>>
> >>> > In hama_[user]_bspmaster_.....log file I get the following
> exception. But
> >>> > this occurs in both cases when I run my job with 3 tasks or with 4
> tasks
> >>>
> >>> In fact, you should not see above initZK error log.
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
> >>> Sent: Monday, June 29, 2015 8:18 AM
> >>> To: user@hama.apache.org
> >>> Subject: Re: Groomserer BSPPeerChild limit
> >>>
> >>> I will try the things that you mentioned. I am not using the latest
> version
> >>> (0.7.0) because I do not understand YARN yet. It adds extra
> configurations
> >>> which makes it more harder for me to understand when things go wrong.
> Any
> >>> suggestions ?
> >>>
> >>> Further, are there any tools that you use for debugging while in
> >>> distributed mode ? and is there any way to manage the server ? I mean
> right
> >>> now, I have 3 machines with alot of configurations files and log
> files. It
> >>> takes alot of time. This makes me wonder how people who have 100s of
> >>> machines debug and manage the cluster.
> >>>
> >>> Regards,
> >>> Behroz
> >>>
> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
> edward.yoon@samsung.com>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > It looks like a zookeeper connection problem. Please check whether
> >>> > zookeeper
> >>> > is running and every tasks can connect to zookeeper.
> >>> >
> >>> > I would recommend you to stop the firewall during debugging, and
> please
> >>> use
> >>> > the 0.7.0 latest release.
> >>> >
> >>> >
> >>> > --
> >>> > Best Regards, Edward J. Yoon
> >>> >
> >>> > -----Original Message-----
> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> >>> > Sent: Monday, June 29, 2015 7:34 AM
> >>> > To: user@hama.apache.org
> >>> > Subject: Re: Groomserer BSPPeerChild limit
> >>> >
> >>> > To figure out the issue, I was trying something else and found out
> >>> another
> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
> following
> >>> > lines give an exception.
> >>> >
> >>> > System.out.println( peer.getPeerName(0)); //Exception
> >>> >
> >>> > System.out.println( peer.getNumPeers()); //Exception
> >>> >
> >>> >
> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
> >>> >
> >>> > [time]java.lang.*RuntimeException: All peer names could not be
> >>> retrieved!*
> >>> >
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >>> >
> >>> > at
> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >>> >
> >>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >>> >
> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >>> >
> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >>> >
> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >>> >
> >>> > at
> >>>
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >>> >
> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <behroz89@gmail.com
> >
> >>> > wrote:
> >>> >
> >>> > > I think I have more information on the issue. I did some debugging
> and
> >>> > > found something quite strange.
> >>> > >
> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3
> task
> >>> > > will be opened on other MACHINE2),
> >>> > >
> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is that
> the
> >>> > > processes do not even enter the SETUP function of BSP class. I have
> >>> print
> >>> > > statements in the setup function of BSP class and it doesn't print
> >>> > > anything. I get empty files with zero size.
> >>> > >
> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000000_0.err
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000000_0.log
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000001_0.err
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000001_0.log
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000002_0.err
> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >>> > > attempt_201506281624_0001_000002_0.log
> >>> > >
> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class and
> >>> prints
> >>> > > stuff. See the size of files generated on output. How is it
> possible
> >>> that
> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
> >>> > >
> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000003_0.err
> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000003_0.log
> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000004_0.err
> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000004_0.log
> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000005_0.err
> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >>> > > attempt_201506281639_0001_000005_0.log
> >>> > >
> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000001_0' has started.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000002_0' has started.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000000_0' has started.
> >>> > >
> >>> > > - Hama Groom log file on MACHINE2 shows
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000003_0' has started.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000004_0' has started.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > 'attempt_201506281639_0001_000005_0' has started.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > attempt_201506281639_0001_000004_0 is *done*.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > attempt_201506281639_0001_000003_0 is *done*.
> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >>> > > attempt_201506281639_0001_000005_0 is *done*.
> >>> > >
> >>> > > Any clue what might be going wrong ?
> >>> > >
> >>> > > Regards,
> >>> > > Behroz
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
> behroz89@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > >> Here is the log file from that folder
> >>> > >>
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for
> port
> >>> > >> 61001
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> >>> > >> address:b178b33b16cc port:61001
> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
> >>> > starting
> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync
> Client
> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
> connecting
> >>> to
> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
> >>> > exiting
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on
> >>> 61001
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
> >>> > exiting
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
> >>> > exiting
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
> >>> > exiting
> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
> >>> > exiting
> >>> > >>
> >>> > >>
> >>> > >> And my console shows the following ouptut. Hama is frozen right
> now.
> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >>> > >> job_201506262331_0003
> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
> number: 2
> >>> > >>
> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>
> >>> > >> wrote:
> >>> > >>
> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
> >>> > >>>
> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
> behroz89@gmail.com
> >>> >
> >>> > >>> wrote:
> >>> > >>> > Yea. I also thought that. I ran the program through eclipse
> with 20
> >>> > >>> tasks
> >>> > >>> > and it works fine.
> >>> > >>> >
> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> >>> > edwardyoon@apache.org
> >>> > >>> >
> >>> > >>> > wrote:
> >>> > >>> >
> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
> When I
> >>> > >>> run my
> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
> increase
> >>> > the
> >>> > >>> tasks
> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> >>> understand
> >>> > >>> what
> >>> > >>> >> can
> >>> > >>> >> > go wrong.
> >>> > >>> >>
> >>> > >>> >> It looks like a program bug. Have you ran your program in
> local
> >>> > mode?
> >>> > >>> >>
> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> >>> > behroz89@gmail.com>
> >>> > >>> >> wrote:
> >>> > >>> >> > Hi,
> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3
> are
> >>> > >>> resolved
> >>> > >>> >> but
> >>> > >>> >> > issue number 2 is still giving me headaches.
> >>> > >>> >> >
> >>> > >>> >> > My problem:
> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them
> properly
> >>> > >>> >> configured
> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop and
> >>> Hama,
> >>> > >>> I can
> >>> > >>> >> > see the processes started on other 2 machines. If I check
> the
> >>> > >>> maximum
> >>> > >>> >> tasks
> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
> >>> > machine).
> >>> > >>> >> >
> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
> When I
> >>> > >>> run my
> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I
> increase
> >>> > the
> >>> > >>> tasks
> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> >>> understand
> >>> > >>> what
> >>> > >>> >> can
> >>> > >>> >> > go wrong.
> >>> > >>> >> >
> >>> > >>> >> > I checked the logs files and things look fine. I just
> sometimes
> >>> > get
> >>> > >>> an
> >>> > >>> >> > exception that hama was not able to delete the sytem
> directory
> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >>> > >>> >> >
> >>> > >>> >> > Any help or clue would be great.
> >>> > >>> >> >
> >>> > >>> >> > Regards,
> >>> > >>> >> > Behroz Sikander
> >>> > >>> >> >
> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >>> > >>> behroz89@gmail.com>
> >>> > >>> >> wrote:
> >>> > >>> >> >
> >>> > >>> >> >> Thank you :)
> >>> > >>> >> >>
> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >>> > >>> edwardyoon@apache.org
> >>> > >>> >> >
> >>> > >>> >> >> wrote:
> >>> > >>> >> >>
> >>> > >>> >> >>> Hi,
> >>> > >>> >> >>>
> >>> > >>> >> >>> You can get the maximum number of available tasks like
> >>> following
> >>> > >>> code:
> >>> > >>> >> >>>
> >>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>> > >>> >> >>>     ClusterStatus cluster =
> jobClient.getClusterStatus(true);
> >>> > >>> >> >>>
> >>> > >>> >> >>>     // Set to maximum
> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>> > >>> >> >>>
> >>> > >>> >> >>>
> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >>> > >>> behroz89@gmail.com>
> >>> > >>> >> >>> wrote:
> >>> > >>> >> >>> > Hi,
> >>> > >>> >> >>> > 1) Thank you for this.
> >>> > >>> >> >>> > 2) Here are the images. I will look into the log files
> of PI
> >>> > >>> example
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > *Result of JPS command on slave*
> >>> > >>> >> >>> >
> >>> > >>> >> >>>
> >>> > >>> >>
> >>> > >>>
> >>> >
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > *Result of JPS command on Master*
> >>> > >>> >> >>> >
> >>> > >>> >> >>>
> >>> > >>> >>
> >>> > >>>
> >>> >
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > 3) In my current case, I do not have any input
> submitted to
> >>> > the
> >>> > >>> job.
> >>> > >>> >> >>> During
> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am
> looking
> >>> > for
> >>> > >>> >> >>> something
> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > Regards,
> >>> > >>> >> >>> > Behroz
> >>> > >>> >> >>> >
> >>> > >>> >> >>> >
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> >>> > >>> >> edwardyoon@apache.org
> >>> > >>> >> >>> >
> >>> > >>> >> >>> > wrote:
> >>> > >>> >> >>> >
> >>> > >>> >> >>> >> Hello,
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
> >>> using
> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
> >>> > >>> fs.defaultFS
> >>> > >>> >> >>> >> property should be in hama-site.xml
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >>   <property>
> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> > >>> >> >>> >>     <description>
> >>> > >>> >> >>> >>       The name of the default file system. Either the
> >>> literal
> >>> > >>> string
> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
> >>> > >>> >> >>> >>     </description>
> >>> > >>> >> >>> >>   </property>
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per
> node.
> >>> > It
> >>> > >>> looks
> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example and
> look
> >>> > at
> >>> > >>> the
> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the
> images
> >>> to
> >>> > >>> >> mailing
> >>> > >>> >> >>> >> list so I can't see it.
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If
> >>> input
> >>> > >>> is
> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven
> by
> >>> the
> >>> > >>> number
> >>> > >>> >> of
> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
> HAMA-956.
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> Thanks!
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> >>> > >>> >> behroz89@gmail.com>
> >>> > >>> >> >>> >> wrote:
> >>> > >>> >> >>> >> > Hi,
> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
> >>> > machine
> >>> > >>> >> setup.
> >>> > >>> >> >>> I was
> >>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS
> to get
> >>> > >>> data. I
> >>> > >>> >> >>> have 3
> >>> > >>> >> >>> >> > trivial questions
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP
> address
> >>> > of
> >>> > >>> >> server
> >>> > >>> >> >>> >> running
> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick
> from
> >>> the
> >>> > >>> >> >>> configurations
> >>> > >>> >> >>> >> > but it does not. I am probably doing something wrong.
> >>> Right
> >>> > >>> now my
> >>> > >>> >> >>> code
> >>> > >>> >> >>> >> work
> >>> > >>> >> >>> >> > by using the following.
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >>> > >>> URI("hdfs://server_ip:port/"),
> >>> > >>> >> >>> conf);
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
> >>> automatically
> >>> > >>> starts
> >>> > >>> >> >>> hama in
> >>> > >>> >> >>> >> > the slave machine (all good). Both master and slave
> are
> >>> set
> >>> > >>> as
> >>> > >>> >> >>> >> groomservers.
> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job which
> >>> means
> >>> > >>> that I
> >>> > >>> >> can
> >>> > >>> >> >>> >> open
> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar
> with
> >>> 3
> >>> > >>> bsp
> >>> > >>> >> tasks
> >>> > >>> >> >>> then
> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks,
> Hama
> >>> > >>> freezes.
> >>> > >>> >> >>> Here is
> >>> > >>> >> >>> >> the
> >>> > >>> >> >>> >> > result of JPS command on slave.
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > Result of JPS command on Master
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves
> but
> >>> not
> >>> > >>> on
> >>> > >>> >> >>> master.
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
> property in
> >>> > >>> >> >>> >> hama-default.xml
> >>> > >>> >> >>> >> > to 4 but still same result.
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
> >>> processes
> >>> > >>> as
> >>> > >>> >> >>> possible.
> >>> > >>> >> >>> >> Is
> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ? Or
> hama
> >>> > >>> picks up
> >>> > >>> >> >>> the
> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > Regards,
> >>> > >>> >> >>> >> >
> >>> > >>> >> >>> >> > Behroz Sikander
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >>
> >>> > >>> >> >>> >> --
> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
> >>> > >>> >> >>> >>
> >>> > >>> >> >>>
> >>> > >>> >> >>>
> >>> > >>> >> >>>
> >>> > >>> >> >>> --
> >>> > >>> >> >>> Best Regards, Edward J. Yoon
> >>> > >>> >> >>>
> >>> > >>> >> >>
> >>> > >>> >> >>
> >>> > >>> >>
> >>> > >>> >>
> >>> > >>> >>
> >>> > >>> >> --
> >>> > >>> >> Best Regards, Edward J. Yoon
> >>> > >>> >>
> >>> > >>>
> >>> > >>>
> >>> > >>>
> >>> > >>> --
> >>> > >>> Best Regards, Edward J. Yoon
> >>> > >>>
> >>> > >>
> >>> > >>
> >>> > >
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
Here's my configurations:

hama-site.xml:

  <property>
    <name>bsp.master.address</name>
    <value>cluster-0:40000</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://cluster-0:9000/</value>
  </property>

  <property>
    <name>hama.zookeeper.quorum</name>
    <value>cluster-0</value>
  </property>


% bin/hama zookeeper
15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
configuration, only one server specified (ignoring)

Then, open new terminal and run master with following command:

% bin/hama bspmaster
...
15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting
15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting
15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING



On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi,
>
> If you run zk server too, BSPmaster will be connected to zk and won't
> throw exceptions.
>
> On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <be...@gmail.com> wrote:
>> Hi,
>> Thank you the information. I moved to hama 0.7.0 and I still have the same
>> problem.
>> When I run % bin/hama bspmaster, I am getting the following exception
>>
>> INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 40013
>>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>> webServer.getConnectors()[0].getLocalPort() returned 40013
>>  INFO http.HttpServer: Jetty bound to port 40013
>>  INFO mortbay.log: jetty-6.1.14
>>  INFO mortbay.log: Extract
>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
>>  INFO bsp.BSPMaster: Cleaning up the system directory
>>  INFO bsp.BSPMaster: hdfs://172.17.0.3:54310/tmp/hama-behroz/bsp/system
>>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>  ERROR sync.ZKSyncBSPMasterClient:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /bsp
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>> at
>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>>  ERROR sync.ZKSyncBSPMasterClient:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /bsp
>>
>> *Why zookeeper settings in hama-site.xml are (right now, I am using just
>> two servers 172.17.0.3 and 172.17.0.7)*
>> <property>
>>                  <name>hama.zookeeper.quorum</name>
>>                  <value>172.17.0.3,172.17.0.7</value>
>>                  <description>Comma separated list of servers in the
>> ZooKeeper quorum.
>>                  For example, "host1.mydomain.com,host2.mydomain.com,
>> host3.mydomain.com".
>>                  By default this is set to localhost for local and
>> pseudo-distributed modes
>>                  of operation. For a fully-distributed setup, this should
>> be set to a full
>>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is
>> set in hama-env.sh
>>                  this is the list of servers which we will start/stop
>> ZooKeeper on.
>>                  </description>
>>         </property>
>>        ......
>>        <property>
>>                  <name>hama.zookeeper.property.clientPort</name>
>>                  <value>2181</value>
>>          </property>
>>
>> Is something wrong with my settings ?
>>
>> Regards,
>> Behroz Sikander
>>
>> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <ed...@samsung.com>
>> wrote:
>>
>>> > (0.7.0) because I do not understand YARN yet. It adds extra
>>> configurations
>>>
>>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
>>> configuration is only needed when you want to submit a BSP job to Yarn
>>> cluster
>>> without Hama cluster. So you don't need to worry about it. :-)
>>>
>>> > distributed mode ? and is there any way to manage the server ? I mean
>>> right
>>> > now, I have 3 machines with alot of configurations files and log files.
>>> It
>>>
>>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>>>
>>> To debug your program, please try like below:
>>>
>>> 1) Run a BSPMaster and Zookeeper at server1.
>>> % bin/hama bspmaster
>>> % bin/hama zookeeper
>>>
>>> 2) Run a Groom at server1 and server2.
>>>
>>> % bin/hama groom
>>>
>>> 3) Check whether deamons are running well. Then, run your program using jar
>>> command at server1.
>>>
>>> % bin/hama jar .....
>>>
>>> > In hama_[user]_bspmaster_.....log file I get the following exception. But
>>> > this occurs in both cases when I run my job with 3 tasks or with 4 tasks
>>>
>>> In fact, you should not see above initZK error log.
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>>
>>> -----Original Message-----
>>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> Sent: Monday, June 29, 2015 8:18 AM
>>> To: user@hama.apache.org
>>> Subject: Re: Groomserer BSPPeerChild limit
>>>
>>> I will try the things that you mentioned. I am not using the latest version
>>> (0.7.0) because I do not understand YARN yet. It adds extra configurations
>>> which makes it more harder for me to understand when things go wrong. Any
>>> suggestions ?
>>>
>>> Further, are there any tools that you use for debugging while in
>>> distributed mode ? and is there any way to manage the server ? I mean right
>>> now, I have 3 machines with alot of configurations files and log files. It
>>> takes alot of time. This makes me wonder how people who have 100s of
>>> machines debug and manage the cluster.
>>>
>>> Regards,
>>> Behroz
>>>
>>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <ed...@samsung.com>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > It looks like a zookeeper connection problem. Please check whether
>>> > zookeeper
>>> > is running and every tasks can connect to zookeeper.
>>> >
>>> > I would recommend you to stop the firewall during debugging, and please
>>> use
>>> > the 0.7.0 latest release.
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>> >
>>> > -----Original Message-----
>>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> > Sent: Monday, June 29, 2015 7:34 AM
>>> > To: user@hama.apache.org
>>> > Subject: Re: Groomserer BSPPeerChild limit
>>> >
>>> > To figure out the issue, I was trying something else and found out
>>> another
>>> > wiered issue. Might be a bug of Hama but I am not sure. Both following
>>> > lines give an exception.
>>> >
>>> > System.out.println( peer.getPeerName(0)); //Exception
>>> >
>>> > System.out.println( peer.getNumPeers()); //Exception
>>> >
>>> >
>>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
>>> >
>>> > [time]java.lang.*RuntimeException: All peer names could not be
>>> retrieved!*
>>> >
>>> > at
>>> >
>>> >
>>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>>> >
>>> > at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>>> >
>>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>>> >
>>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>>> >
>>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>>> >
>>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>> >
>>> > at
>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>>> >
>>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com>
>>> > wrote:
>>> >
>>> > > I think I have more information on the issue. I did some debugging and
>>> > > found something quite strange.
>>> > >
>>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
>>> > > will be opened on other MACHINE2),
>>> > >
>>> > >  -  3 tasks on Machine1 are frozen and the strange thing is that the
>>> > > processes do not even enter the SETUP function of BSP class. I have
>>> print
>>> > > statements in the setup function of BSP class and it doesn't print
>>> > > anything. I get empty files with zero size.
>>> > >
>>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000000_0.err
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000000_0.log
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000001_0.err
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000001_0.log
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000002_0.err
>>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> > > attempt_201506281624_0001_000002_0.log
>>> > >
>>> > > - On MACHINE2, the code enters the SETUP function of BSP class and
>>> prints
>>> > > stuff. See the size of files generated on output. How is it possible
>>> that
>>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>>> > >
>>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> > > attempt_201506281639_0001_000003_0.err
>>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> > > attempt_201506281639_0001_000003_0.log
>>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> > > attempt_201506281639_0001_000004_0.err
>>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>>> > > attempt_201506281639_0001_000004_0.log
>>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> > > attempt_201506281639_0001_000005_0.err
>>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> > > attempt_201506281639_0001_000005_0.log
>>> > >
>>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000001_0' has started.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000002_0' has started.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000000_0' has started.
>>> > >
>>> > > - Hama Groom log file on MACHINE2 shows
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000003_0' has started.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000004_0' has started.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201506281639_0001_000005_0' has started.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > attempt_201506281639_0001_000004_0 is *done*.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > attempt_201506281639_0001_000003_0 is *done*.
>>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> > > attempt_201506281639_0001_000005_0 is *done*.
>>> > >
>>> > > Any clue what might be going wrong ?
>>> > >
>>> > > Regards,
>>> > > Behroz
>>> > >
>>> > >
>>> > >
>>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> Here is the log file from that folder
>>> > >>
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
>>> > >> 61001
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>>> > >> address:b178b33b16cc port:61001
>>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
>>> > starting
>>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting
>>> to
>>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
>>> > exiting
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on
>>> 61001
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
>>> > exiting
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
>>> > exiting
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
>>> > exiting
>>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
>>> > exiting
>>> > >>
>>> > >>
>>> > >> And my console shows the following ouptut. Hama is frozen right now.
>>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>>> > >> job_201506262331_0003
>>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
>>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
>>> > >>
>>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> > >> wrote:
>>> > >>
>>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>>> > >>>
>>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <behroz89@gmail.com
>>> >
>>> > >>> wrote:
>>> > >>> > Yea. I also thought that. I ran the program through eclipse with 20
>>> > >>> tasks
>>> > >>> > and it works fine.
>>> > >>> >
>>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>>> > edwardyoon@apache.org
>>> > >>> >
>>> > >>> > wrote:
>>> > >>> >
>>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> > >>> run my
>>> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
>>> > the
>>> > >>> tasks
>>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>>> understand
>>> > >>> what
>>> > >>> >> can
>>> > >>> >> > go wrong.
>>> > >>> >>
>>> > >>> >> It looks like a program bug. Have you ran your program in local
>>> > mode?
>>> > >>> >>
>>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>>> > behroz89@gmail.com>
>>> > >>> >> wrote:
>>> > >>> >> > Hi,
>>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
>>> > >>> resolved
>>> > >>> >> but
>>> > >>> >> > issue number 2 is still giving me headaches.
>>> > >>> >> >
>>> > >>> >> > My problem:
>>> > >>> >> > My cluster now consists of 3 machines. Each one of them properly
>>> > >>> >> configured
>>> > >>> >> > (Apparently). From my master machine when I start Hadoop and
>>> Hama,
>>> > >>> I can
>>> > >>> >> > see the processes started on other 2 machines. If I check the
>>> > >>> maximum
>>> > >>> >> tasks
>>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
>>> > machine).
>>> > >>> >> >
>>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> > >>> run my
>>> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
>>> > the
>>> > >>> tasks
>>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>>> understand
>>> > >>> what
>>> > >>> >> can
>>> > >>> >> > go wrong.
>>> > >>> >> >
>>> > >>> >> > I checked the logs files and things look fine. I just sometimes
>>> > get
>>> > >>> an
>>> > >>> >> > exception that hama was not able to delete the sytem directory
>>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>>> > >>> >> >
>>> > >>> >> > Any help or clue would be great.
>>> > >>> >> >
>>> > >>> >> > Regards,
>>> > >>> >> > Behroz Sikander
>>> > >>> >> >
>>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>>> > >>> behroz89@gmail.com>
>>> > >>> >> wrote:
>>> > >>> >> >
>>> > >>> >> >> Thank you :)
>>> > >>> >> >>
>>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>>> > >>> edwardyoon@apache.org
>>> > >>> >> >
>>> > >>> >> >> wrote:
>>> > >>> >> >>
>>> > >>> >> >>> Hi,
>>> > >>> >> >>>
>>> > >>> >> >>> You can get the maximum number of available tasks like
>>> following
>>> > >>> code:
>>> > >>> >> >>>
>>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>> > >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>> > >>> >> >>>
>>> > >>> >> >>>     // Set to maximum
>>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>> > >>> >> >>>
>>> > >>> >> >>>
>>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>>> > >>> behroz89@gmail.com>
>>> > >>> >> >>> wrote:
>>> > >>> >> >>> > Hi,
>>> > >>> >> >>> > 1) Thank you for this.
>>> > >>> >> >>> > 2) Here are the images. I will look into the log files of PI
>>> > >>> example
>>> > >>> >> >>> >
>>> > >>> >> >>> > *Result of JPS command on slave*
>>> > >>> >> >>> >
>>> > >>> >> >>>
>>> > >>> >>
>>> > >>>
>>> >
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> > >>> >> >>> >
>>> > >>> >> >>> > *Result of JPS command on Master*
>>> > >>> >> >>> >
>>> > >>> >> >>>
>>> > >>> >>
>>> > >>>
>>> >
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> > >>> >> >>> >
>>> > >>> >> >>> > 3) In my current case, I do not have any input submitted to
>>> > the
>>> > >>> job.
>>> > >>> >> >>> During
>>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking
>>> > for
>>> > >>> >> >>> something
>>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>>> > >>> >> >>> >
>>> > >>> >> >>> > Regards,
>>> > >>> >> >>> > Behroz
>>> > >>> >> >>> >
>>> > >>> >> >>> >
>>> > >>> >> >>> >
>>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>>> > >>> >> edwardyoon@apache.org
>>> > >>> >> >>> >
>>> > >>> >> >>> > wrote:
>>> > >>> >> >>> >
>>> > >>> >> >>> >> Hello,
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
>>> using
>>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
>>> > >>> fs.defaultFS
>>> > >>> >> >>> >> property should be in hama-site.xml
>>> > >>> >> >>> >>
>>> > >>> >> >>> >>   <property>
>>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> > >>> >> >>> >>     <description>
>>> > >>> >> >>> >>       The name of the default file system. Either the
>>> literal
>>> > >>> string
>>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>>> > >>> >> >>> >>     </description>
>>> > >>> >> >>> >>   </property>
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node.
>>> > It
>>> > >>> looks
>>> > >>> >> >>> >> cluster configuration issue. Please run Pi example and look
>>> > at
>>> > >>> the
>>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the images
>>> to
>>> > >>> >> mailing
>>> > >>> >> >>> >> list so I can't see it.
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If
>>> input
>>> > >>> is
>>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven by
>>> the
>>> > >>> number
>>> > >>> >> of
>>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> Thanks!
>>> > >>> >> >>> >>
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>>> > >>> >> behroz89@gmail.com>
>>> > >>> >> >>> >> wrote:
>>> > >>> >> >>> >> > Hi,
>>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
>>> > machine
>>> > >>> >> setup.
>>> > >>> >> >>> I was
>>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS to get
>>> > >>> data. I
>>> > >>> >> >>> have 3
>>> > >>> >> >>> >> > trivial questions
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address
>>> > of
>>> > >>> >> server
>>> > >>> >> >>> >> running
>>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick from
>>> the
>>> > >>> >> >>> configurations
>>> > >>> >> >>> >> > but it does not. I am probably doing something wrong.
>>> Right
>>> > >>> now my
>>> > >>> >> >>> code
>>> > >>> >> >>> >> work
>>> > >>> >> >>> >> > by using the following.
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>>> > >>> URI("hdfs://server_ip:port/"),
>>> > >>> >> >>> conf);
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>>> automatically
>>> > >>> starts
>>> > >>> >> >>> hama in
>>> > >>> >> >>> >> > the slave machine (all good). Both master and slave are
>>> set
>>> > >>> as
>>> > >>> >> >>> >> groomservers.
>>> > >>> >> >>> >> > This means that I have 2 servers to run my job which
>>> means
>>> > >>> that I
>>> > >>> >> can
>>> > >>> >> >>> >> open
>>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with
>>> 3
>>> > >>> bsp
>>> > >>> >> tasks
>>> > >>> >> >>> then
>>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
>>> > >>> freezes.
>>> > >>> >> >>> Here is
>>> > >>> >> >>> >> the
>>> > >>> >> >>> >> > result of JPS command on slave.
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > Result of JPS command on Master
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves but
>>> not
>>> > >>> on
>>> > >>> >> >>> master.
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>>> > >>> >> >>> >> hama-default.xml
>>> > >>> >> >>> >> > to 4 but still same result.
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
>>> processes
>>> > >>> as
>>> > >>> >> >>> possible.
>>> > >>> >> >>> >> Is
>>> > >>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
>>> > >>> picks up
>>> > >>> >> >>> the
>>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > Regards,
>>> > >>> >> >>> >> >
>>> > >>> >> >>> >> > Behroz Sikander
>>> > >>> >> >>> >>
>>> > >>> >> >>> >>
>>> > >>> >> >>> >>
>>> > >>> >> >>> >> --
>>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>>> > >>> >> >>> >>
>>> > >>> >> >>>
>>> > >>> >> >>>
>>> > >>> >> >>>
>>> > >>> >> >>> --
>>> > >>> >> >>> Best Regards, Edward J. Yoon
>>> > >>> >> >>>
>>> > >>> >> >>
>>> > >>> >> >>
>>> > >>> >>
>>> > >>> >>
>>> > >>> >>
>>> > >>> >> --
>>> > >>> >> Best Regards, Edward J. Yoon
>>> > >>> >>
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best Regards, Edward J. Yoon
>>> > >>>
>>> > >>
>>> > >>
>>> > >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,

If you run zk server too, BSPmaster will be connected to zk and won't
throw exceptions.

On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <be...@gmail.com> wrote:
> Hi,
> Thank you the information. I moved to hama 0.7.0 and I still have the same
> problem.
> When I run % bin/hama bspmaster, I am getting the following exception
>
> INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 40013
>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> webServer.getConnectors()[0].getLocalPort() returned 40013
>  INFO http.HttpServer: Jetty bound to port 40013
>  INFO mortbay.log: jetty-6.1.14
>  INFO mortbay.log: Extract
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
>  INFO bsp.BSPMaster: Cleaning up the system directory
>  INFO bsp.BSPMaster: hdfs://172.17.0.3:54310/tmp/hama-behroz/bsp/system
>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>  ERROR sync.ZKSyncBSPMasterClient:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /bsp
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>  ERROR sync.ZKSyncBSPMasterClient:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /bsp
>
> *Why zookeeper settings in hama-site.xml are (right now, I am using just
> two servers 172.17.0.3 and 172.17.0.7)*
> <property>
>                  <name>hama.zookeeper.quorum</name>
>                  <value>172.17.0.3,172.17.0.7</value>
>                  <description>Comma separated list of servers in the
> ZooKeeper quorum.
>                  For example, "host1.mydomain.com,host2.mydomain.com,
> host3.mydomain.com".
>                  By default this is set to localhost for local and
> pseudo-distributed modes
>                  of operation. For a fully-distributed setup, this should
> be set to a full
>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is
> set in hama-env.sh
>                  this is the list of servers which we will start/stop
> ZooKeeper on.
>                  </description>
>         </property>
>        ......
>        <property>
>                  <name>hama.zookeeper.property.clientPort</name>
>                  <value>2181</value>
>          </property>
>
> Is something wrong with my settings ?
>
> Regards,
> Behroz Sikander
>
> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <ed...@samsung.com>
> wrote:
>
>> > (0.7.0) because I do not understand YARN yet. It adds extra
>> configurations
>>
>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
>> configuration is only needed when you want to submit a BSP job to Yarn
>> cluster
>> without Hama cluster. So you don't need to worry about it. :-)
>>
>> > distributed mode ? and is there any way to manage the server ? I mean
>> right
>> > now, I have 3 machines with alot of configurations files and log files.
>> It
>>
>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>>
>> To debug your program, please try like below:
>>
>> 1) Run a BSPMaster and Zookeeper at server1.
>> % bin/hama bspmaster
>> % bin/hama zookeeper
>>
>> 2) Run a Groom at server1 and server2.
>>
>> % bin/hama groom
>>
>> 3) Check whether deamons are running well. Then, run your program using jar
>> command at server1.
>>
>> % bin/hama jar .....
>>
>> > In hama_[user]_bspmaster_.....log file I get the following exception. But
>> > this occurs in both cases when I run my job with 3 tasks or with 4 tasks
>>
>> In fact, you should not see above initZK error log.
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>>
>> -----Original Message-----
>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>> Sent: Monday, June 29, 2015 8:18 AM
>> To: user@hama.apache.org
>> Subject: Re: Groomserer BSPPeerChild limit
>>
>> I will try the things that you mentioned. I am not using the latest version
>> (0.7.0) because I do not understand YARN yet. It adds extra configurations
>> which makes it more harder for me to understand when things go wrong. Any
>> suggestions ?
>>
>> Further, are there any tools that you use for debugging while in
>> distributed mode ? and is there any way to manage the server ? I mean right
>> now, I have 3 machines with alot of configurations files and log files. It
>> takes alot of time. This makes me wonder how people who have 100s of
>> machines debug and manage the cluster.
>>
>> Regards,
>> Behroz
>>
>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <ed...@samsung.com>
>> wrote:
>>
>> > Hi,
>> >
>> > It looks like a zookeeper connection problem. Please check whether
>> > zookeeper
>> > is running and every tasks can connect to zookeeper.
>> >
>> > I would recommend you to stop the firewall during debugging, and please
>> use
>> > the 0.7.0 latest release.
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> >
>> > -----Original Message-----
>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>> > Sent: Monday, June 29, 2015 7:34 AM
>> > To: user@hama.apache.org
>> > Subject: Re: Groomserer BSPPeerChild limit
>> >
>> > To figure out the issue, I was trying something else and found out
>> another
>> > wiered issue. Might be a bug of Hama but I am not sure. Both following
>> > lines give an exception.
>> >
>> > System.out.println( peer.getPeerName(0)); //Exception
>> >
>> > System.out.println( peer.getNumPeers()); //Exception
>> >
>> >
>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
>> >
>> > [time]java.lang.*RuntimeException: All peer names could not be
>> retrieved!*
>> >
>> > at
>> >
>> >
>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>> >
>> > at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>> >
>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>> >
>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>> >
>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>> >
>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>> >
>> > at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>> >
>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com>
>> > wrote:
>> >
>> > > I think I have more information on the issue. I did some debugging and
>> > > found something quite strange.
>> > >
>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
>> > > will be opened on other MACHINE2),
>> > >
>> > >  -  3 tasks on Machine1 are frozen and the strange thing is that the
>> > > processes do not even enter the SETUP function of BSP class. I have
>> print
>> > > statements in the setup function of BSP class and it doesn't print
>> > > anything. I get empty files with zero size.
>> > >
>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000000_0.err
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000000_0.log
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000001_0.err
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000001_0.log
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000002_0.err
>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>> > > attempt_201506281624_0001_000002_0.log
>> > >
>> > > - On MACHINE2, the code enters the SETUP function of BSP class and
>> prints
>> > > stuff. See the size of files generated on output. How is it possible
>> that
>> > > in 3 tasks the code can enter BSP and in others it cannot ?
>> > >
>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> > > attempt_201506281639_0001_000003_0.err
>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> > > attempt_201506281639_0001_000003_0.log
>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> > > attempt_201506281639_0001_000004_0.err
>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>> > > attempt_201506281639_0001_000004_0.log
>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>> > > attempt_201506281639_0001_000005_0.err
>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>> > > attempt_201506281639_0001_000005_0.log
>> > >
>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000001_0' has started.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000002_0' has started.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000000_0' has started.
>> > >
>> > > - Hama Groom log file on MACHINE2 shows
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000003_0' has started.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000004_0' has started.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > 'attempt_201506281639_0001_000005_0' has started.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > attempt_201506281639_0001_000004_0 is *done*.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > attempt_201506281639_0001_000003_0 is *done*.
>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>> > > attempt_201506281639_0001_000005_0 is *done*.
>> > >
>> > > Any clue what might be going wrong ?
>> > >
>> > > Regards,
>> > > Behroz
>> > >
>> > >
>> > >
>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
>> > > wrote:
>> > >
>> > >> Here is the log file from that folder
>> > >>
>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
>> > >> 61001
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>> > >> address:b178b33b16cc port:61001
>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
>> > starting
>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting
>> to
>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
>> > exiting
>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on
>> 61001
>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
>> > exiting
>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
>> > exiting
>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
>> > exiting
>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
>> > exiting
>> > >>
>> > >>
>> > >> And my console shows the following ouptut. Hama is frozen right now.
>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> > >> job_201506262331_0003
>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
>> > >>
>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> > >> wrote:
>> > >>
>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>> > >>>
>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <behroz89@gmail.com
>> >
>> > >>> wrote:
>> > >>> > Yea. I also thought that. I ran the program through eclipse with 20
>> > >>> tasks
>> > >>> > and it works fine.
>> > >>> >
>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
>> > edwardyoon@apache.org
>> > >>> >
>> > >>> > wrote:
>> > >>> >
>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>> > >>> run my
>> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
>> > the
>> > >>> tasks
>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> understand
>> > >>> what
>> > >>> >> can
>> > >>> >> > go wrong.
>> > >>> >>
>> > >>> >> It looks like a program bug. Have you ran your program in local
>> > mode?
>> > >>> >>
>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
>> > behroz89@gmail.com>
>> > >>> >> wrote:
>> > >>> >> > Hi,
>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
>> > >>> resolved
>> > >>> >> but
>> > >>> >> > issue number 2 is still giving me headaches.
>> > >>> >> >
>> > >>> >> > My problem:
>> > >>> >> > My cluster now consists of 3 machines. Each one of them properly
>> > >>> >> configured
>> > >>> >> > (Apparently). From my master machine when I start Hadoop and
>> Hama,
>> > >>> I can
>> > >>> >> > see the processes started on other 2 machines. If I check the
>> > >>> maximum
>> > >>> >> tasks
>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
>> > machine).
>> > >>> >> >
>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>> > >>> run my
>> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
>> > the
>> > >>> tasks
>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
>> understand
>> > >>> what
>> > >>> >> can
>> > >>> >> > go wrong.
>> > >>> >> >
>> > >>> >> > I checked the logs files and things look fine. I just sometimes
>> > get
>> > >>> an
>> > >>> >> > exception that hama was not able to delete the sytem directory
>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
>> > >>> >> >
>> > >>> >> > Any help or clue would be great.
>> > >>> >> >
>> > >>> >> > Regards,
>> > >>> >> > Behroz Sikander
>> > >>> >> >
>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>> > >>> behroz89@gmail.com>
>> > >>> >> wrote:
>> > >>> >> >
>> > >>> >> >> Thank you :)
>> > >>> >> >>
>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>> > >>> edwardyoon@apache.org
>> > >>> >> >
>> > >>> >> >> wrote:
>> > >>> >> >>
>> > >>> >> >>> Hi,
>> > >>> >> >>>
>> > >>> >> >>> You can get the maximum number of available tasks like
>> following
>> > >>> code:
>> > >>> >> >>>
>> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> > >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>> > >>> >> >>>
>> > >>> >> >>>     // Set to maximum
>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> > >>> >> >>>
>> > >>> >> >>>
>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>> > >>> behroz89@gmail.com>
>> > >>> >> >>> wrote:
>> > >>> >> >>> > Hi,
>> > >>> >> >>> > 1) Thank you for this.
>> > >>> >> >>> > 2) Here are the images. I will look into the log files of PI
>> > >>> example
>> > >>> >> >>> >
>> > >>> >> >>> > *Result of JPS command on slave*
>> > >>> >> >>> >
>> > >>> >> >>>
>> > >>> >>
>> > >>>
>> >
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> > >>> >> >>> >
>> > >>> >> >>> > *Result of JPS command on Master*
>> > >>> >> >>> >
>> > >>> >> >>>
>> > >>> >>
>> > >>>
>> >
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> > >>> >> >>> >
>> > >>> >> >>> > 3) In my current case, I do not have any input submitted to
>> > the
>> > >>> job.
>> > >>> >> >>> During
>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking
>> > for
>> > >>> >> >>> something
>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>> > >>> >> >>> >
>> > >>> >> >>> > Regards,
>> > >>> >> >>> > Behroz
>> > >>> >> >>> >
>> > >>> >> >>> >
>> > >>> >> >>> >
>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> > >>> >> edwardyoon@apache.org
>> > >>> >> >>> >
>> > >>> >> >>> > wrote:
>> > >>> >> >>> >
>> > >>> >> >>> >> Hello,
>> > >>> >> >>> >>
>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
>> using
>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
>> > >>> fs.defaultFS
>> > >>> >> >>> >> property should be in hama-site.xml
>> > >>> >> >>> >>
>> > >>> >> >>> >>   <property>
>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>> > >>> >> >>> >>     <description>
>> > >>> >> >>> >>       The name of the default file system. Either the
>> literal
>> > >>> string
>> > >>> >> >>> >>       "local" or a host:port for HDFS.
>> > >>> >> >>> >>     </description>
>> > >>> >> >>> >>   </property>
>> > >>> >> >>> >>
>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node.
>> > It
>> > >>> looks
>> > >>> >> >>> >> cluster configuration issue. Please run Pi example and look
>> > at
>> > >>> the
>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the images
>> to
>> > >>> >> mailing
>> > >>> >> >>> >> list so I can't see it.
>> > >>> >> >>> >>
>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If
>> input
>> > >>> is
>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven by
>> the
>> > >>> number
>> > >>> >> of
>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>> > >>> >> >>> >>
>> > >>> >> >>> >> Thanks!
>> > >>> >> >>> >>
>> > >>> >> >>> >>
>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> > >>> >> behroz89@gmail.com>
>> > >>> >> >>> >> wrote:
>> > >>> >> >>> >> > Hi,
>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
>> > machine
>> > >>> >> setup.
>> > >>> >> >>> I was
>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS to get
>> > >>> data. I
>> > >>> >> >>> have 3
>> > >>> >> >>> >> > trivial questions
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address
>> > of
>> > >>> >> server
>> > >>> >> >>> >> running
>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick from
>> the
>> > >>> >> >>> configurations
>> > >>> >> >>> >> > but it does not. I am probably doing something wrong.
>> Right
>> > >>> now my
>> > >>> >> >>> code
>> > >>> >> >>> >> work
>> > >>> >> >>> >> > by using the following.
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
>> > >>> URI("hdfs://server_ip:port/"),
>> > >>> >> >>> conf);
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > 2- On my master server, when I start hama it
>> automatically
>> > >>> starts
>> > >>> >> >>> hama in
>> > >>> >> >>> >> > the slave machine (all good). Both master and slave are
>> set
>> > >>> as
>> > >>> >> >>> >> groomservers.
>> > >>> >> >>> >> > This means that I have 2 servers to run my job which
>> means
>> > >>> that I
>> > >>> >> can
>> > >>> >> >>> >> open
>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with
>> 3
>> > >>> bsp
>> > >>> >> tasks
>> > >>> >> >>> then
>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
>> > >>> freezes.
>> > >>> >> >>> Here is
>> > >>> >> >>> >> the
>> > >>> >> >>> >> > result of JPS command on slave.
>> > >>> >> >>> >> >
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > Result of JPS command on Master
>> > >>> >> >>> >> >
>> > >>> >> >>> >> >
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves but
>> not
>> > >>> on
>> > >>> >> >>> master.
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>> > >>> >> >>> >> hama-default.xml
>> > >>> >> >>> >> > to 4 but still same result.
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
>> processes
>> > >>> as
>> > >>> >> >>> possible.
>> > >>> >> >>> >> Is
>> > >>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
>> > >>> picks up
>> > >>> >> >>> the
>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
>> > >>> >> >>> >> >
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > Regards,
>> > >>> >> >>> >> >
>> > >>> >> >>> >> > Behroz Sikander
>> > >>> >> >>> >>
>> > >>> >> >>> >>
>> > >>> >> >>> >>
>> > >>> >> >>> >> --
>> > >>> >> >>> >> Best Regards, Edward J. Yoon
>> > >>> >> >>> >>
>> > >>> >> >>>
>> > >>> >> >>>
>> > >>> >> >>>
>> > >>> >> >>> --
>> > >>> >> >>> Best Regards, Edward J. Yoon
>> > >>> >> >>>
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >>
>> > >>> >>
>> > >>> >>
>> > >>> >> --
>> > >>> >> Best Regards, Edward J. Yoon
>> > >>> >>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards, Edward J. Yoon
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>> >
>> >
>>
>>
>>



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Hi,
Thank you the information. I moved to hama 0.7.0 and I still have the same
problem.
When I run % bin/hama bspmaster, I am getting the following exception

INFO http.HttpServer: Port returned by
webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
the listener on 40013
 INFO http.HttpServer: listener.getLocalPort() returned 40013
webServer.getConnectors()[0].getLocalPort() returned 40013
 INFO http.HttpServer: Jetty bound to port 40013
 INFO mortbay.log: jetty-6.1.14
 INFO mortbay.log: Extract
jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
 INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
 INFO bsp.BSPMaster: Cleaning up the system directory
 INFO bsp.BSPMaster: hdfs://172.17.0.3:54310/tmp/hama-behroz/bsp/system
 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
 INFO sync.ZKSyncClient: Initializing ZK Sync Client
 ERROR sync.ZKSyncBSPMasterClient:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
 ERROR sync.ZKSyncBSPMasterClient:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp

*Why zookeeper settings in hama-site.xml are (right now, I am using just
two servers 172.17.0.3 and 172.17.0.7)*
<property>
                 <name>hama.zookeeper.quorum</name>
                 <value>172.17.0.3,172.17.0.7</value>
                 <description>Comma separated list of servers in the
ZooKeeper quorum.
                 For example, "host1.mydomain.com,host2.mydomain.com,
host3.mydomain.com".
                 By default this is set to localhost for local and
pseudo-distributed modes
                 of operation. For a fully-distributed setup, this should
be set to a full
                 list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is
set in hama-env.sh
                 this is the list of servers which we will start/stop
ZooKeeper on.
                 </description>
        </property>
       ......
       <property>
                 <name>hama.zookeeper.property.clientPort</name>
                 <value>2181</value>
         </property>

Is something wrong with my settings ?

Regards,
Behroz Sikander

On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> > (0.7.0) because I do not understand YARN yet. It adds extra
> configurations
>
> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn
> configuration is only needed when you want to submit a BSP job to Yarn
> cluster
> without Hama cluster. So you don't need to worry about it. :-)
>
> > distributed mode ? and is there any way to manage the server ? I mean
> right
> > now, I have 3 machines with alot of configurations files and log files.
> It
>
> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>
> To debug your program, please try like below:
>
> 1) Run a BSPMaster and Zookeeper at server1.
> % bin/hama bspmaster
> % bin/hama zookeeper
>
> 2) Run a Groom at server1 and server2.
>
> % bin/hama groom
>
> 3) Check whether deamons are running well. Then, run your program using jar
> command at server1.
>
> % bin/hama jar .....
>
> > In hama_[user]_bspmaster_.....log file I get the following exception. But
> > this occurs in both cases when I run my job with 3 tasks or with 4 tasks
>
> In fact, you should not see above initZK error log.
>
> --
> Best Regards, Edward J. Yoon
>
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Monday, June 29, 2015 8:18 AM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> I will try the things that you mentioned. I am not using the latest version
> (0.7.0) because I do not understand YARN yet. It adds extra configurations
> which makes it more harder for me to understand when things go wrong. Any
> suggestions ?
>
> Further, are there any tools that you use for debugging while in
> distributed mode ? and is there any way to manage the server ? I mean right
> now, I have 3 machines with alot of configurations files and log files. It
> takes alot of time. This makes me wonder how people who have 100s of
> machines debug and manage the cluster.
>
> Regards,
> Behroz
>
> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <ed...@samsung.com>
> wrote:
>
> > Hi,
> >
> > It looks like a zookeeper connection problem. Please check whether
> > zookeeper
> > is running and every tasks can connect to zookeeper.
> >
> > I would recommend you to stop the firewall during debugging, and please
> use
> > the 0.7.0 latest release.
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > -----Original Message-----
> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> > Sent: Monday, June 29, 2015 7:34 AM
> > To: user@hama.apache.org
> > Subject: Re: Groomserer BSPPeerChild limit
> >
> > To figure out the issue, I was trying something else and found out
> another
> > wiered issue. Might be a bug of Hama but I am not sure. Both following
> > lines give an exception.
> >
> > System.out.println( peer.getPeerName(0)); //Exception
> >
> > System.out.println( peer.getNumPeers()); //Exception
> >
> >
> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
> >
> > [time]java.lang.*RuntimeException: All peer names could not be
> retrieved!*
> >
> > at
> >
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >
> > at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >
> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >
> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >
> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >
> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >
> > at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >
> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com>
> > wrote:
> >
> > > I think I have more information on the issue. I did some debugging and
> > > found something quite strange.
> > >
> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> > > will be opened on other MACHINE2),
> > >
> > >  -  3 tasks on Machine1 are frozen and the strange thing is that the
> > > processes do not even enter the SETUP function of BSP class. I have
> print
> > > statements in the setup function of BSP class and it doesn't print
> > > anything. I get empty files with zero size.
> > >
> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000000_0.err
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000000_0.log
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000001_0.err
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000001_0.log
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000002_0.err
> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > > attempt_201506281624_0001_000002_0.log
> > >
> > > - On MACHINE2, the code enters the SETUP function of BSP class and
> prints
> > > stuff. See the size of files generated on output. How is it possible
> that
> > > in 3 tasks the code can enter BSP and in others it cannot ?
> > >
> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > > attempt_201506281639_0001_000003_0.err
> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > > attempt_201506281639_0001_000003_0.log
> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > > attempt_201506281639_0001_000004_0.err
> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> > > attempt_201506281639_0001_000004_0.log
> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > > attempt_201506281639_0001_000005_0.err
> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > > attempt_201506281639_0001_000005_0.log
> > >
> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000001_0' has started.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000002_0' has started.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000000_0' has started.
> > >
> > > - Hama Groom log file on MACHINE2 shows
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000003_0' has started.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000004_0' has started.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > 'attempt_201506281639_0001_000005_0' has started.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > attempt_201506281639_0001_000004_0 is *done*.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > attempt_201506281639_0001_000003_0 is *done*.
> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > > attempt_201506281639_0001_000005_0 is *done*.
> > >
> > > Any clue what might be going wrong ?
> > >
> > > Regards,
> > > Behroz
> > >
> > >
> > >
> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> > > wrote:
> > >
> > >> Here is the log file from that folder
> > >>
> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
> > >> 61001
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> > >> address:b178b33b16cc port:61001
> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
> > starting
> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting
> to
> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
> > exiting
> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on
> 61001
> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
> > exiting
> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
> > exiting
> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
> > exiting
> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
> > exiting
> > >>
> > >>
> > >> And my console shows the following ouptut. Hama is frozen right now.
> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> > >> job_201506262331_0003
> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
> > >>
> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> edwardyoon@apache.org>
> > >> wrote:
> > >>
> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
> > >>>
> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <behroz89@gmail.com
> >
> > >>> wrote:
> > >>> > Yea. I also thought that. I ran the program through eclipse with 20
> > >>> tasks
> > >>> > and it works fine.
> > >>> >
> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> > edwardyoon@apache.org
> > >>> >
> > >>> > wrote:
> > >>> >
> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> > >>> run my
> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
> > the
> > >>> tasks
> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> understand
> > >>> what
> > >>> >> can
> > >>> >> > go wrong.
> > >>> >>
> > >>> >> It looks like a program bug. Have you ran your program in local
> > mode?
> > >>> >>
> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> > behroz89@gmail.com>
> > >>> >> wrote:
> > >>> >> > Hi,
> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
> > >>> resolved
> > >>> >> but
> > >>> >> > issue number 2 is still giving me headaches.
> > >>> >> >
> > >>> >> > My problem:
> > >>> >> > My cluster now consists of 3 machines. Each one of them properly
> > >>> >> configured
> > >>> >> > (Apparently). From my master machine when I start Hadoop and
> Hama,
> > >>> I can
> > >>> >> > see the processes started on other 2 machines. If I check the
> > >>> maximum
> > >>> >> tasks
> > >>> >> > that my cluster can support then I get 9 (3 tasks on each
> > machine).
> > >>> >> >
> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> > >>> run my
> > >>> >> > program with 3 tasks, everything runs fine. But when I increase
> > the
> > >>> tasks
> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not
> understand
> > >>> what
> > >>> >> can
> > >>> >> > go wrong.
> > >>> >> >
> > >>> >> > I checked the logs files and things look fine. I just sometimes
> > get
> > >>> an
> > >>> >> > exception that hama was not able to delete the sytem directory
> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> > >>> >> >
> > >>> >> > Any help or clue would be great.
> > >>> >> >
> > >>> >> > Regards,
> > >>> >> > Behroz Sikander
> > >>> >> >
> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> > >>> behroz89@gmail.com>
> > >>> >> wrote:
> > >>> >> >
> > >>> >> >> Thank you :)
> > >>> >> >>
> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> > >>> edwardyoon@apache.org
> > >>> >> >
> > >>> >> >> wrote:
> > >>> >> >>
> > >>> >> >>> Hi,
> > >>> >> >>>
> > >>> >> >>> You can get the maximum number of available tasks like
> following
> > >>> code:
> > >>> >> >>>
> > >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> > >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> > >>> >> >>>
> > >>> >> >>>     // Set to maximum
> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> > >>> >> >>>
> > >>> >> >>>
> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> > >>> behroz89@gmail.com>
> > >>> >> >>> wrote:
> > >>> >> >>> > Hi,
> > >>> >> >>> > 1) Thank you for this.
> > >>> >> >>> > 2) Here are the images. I will look into the log files of PI
> > >>> example
> > >>> >> >>> >
> > >>> >> >>> > *Result of JPS command on slave*
> > >>> >> >>> >
> > >>> >> >>>
> > >>> >>
> > >>>
> >
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> > >>> >> >>> >
> > >>> >> >>> > *Result of JPS command on Master*
> > >>> >> >>> >
> > >>> >> >>>
> > >>> >>
> > >>>
> >
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> > >>> >> >>> >
> > >>> >> >>> > 3) In my current case, I do not have any input submitted to
> > the
> > >>> job.
> > >>> >> >>> During
> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking
> > for
> > >>> >> >>> something
> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> > >>> >> >>> >
> > >>> >> >>> > Regards,
> > >>> >> >>> > Behroz
> > >>> >> >>> >
> > >>> >> >>> >
> > >>> >> >>> >
> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> > >>> >> edwardyoon@apache.org
> > >>> >> >>> >
> > >>> >> >>> > wrote:
> > >>> >> >>> >
> > >>> >> >>> >> Hello,
> > >>> >> >>> >>
> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration
> using
> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
> > >>> fs.defaultFS
> > >>> >> >>> >> property should be in hama-site.xml
> > >>> >> >>> >>
> > >>> >> >>> >>   <property>
> > >>> >> >>> >>     <name>fs.defaultFS</name>
> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> > >>> >> >>> >>     <description>
> > >>> >> >>> >>       The name of the default file system. Either the
> literal
> > >>> string
> > >>> >> >>> >>       "local" or a host:port for HDFS.
> > >>> >> >>> >>     </description>
> > >>> >> >>> >>   </property>
> > >>> >> >>> >>
> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node.
> > It
> > >>> looks
> > >>> >> >>> >> cluster configuration issue. Please run Pi example and look
> > at
> > >>> the
> > >>> >> >>> >> logs for more details. NOTE: you can not attach the images
> to
> > >>> >> mailing
> > >>> >> >>> >> list so I can't see it.
> > >>> >> >>> >>
> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If
> input
> > >>> is
> > >>> >> >>> >> provided, the number of BSP tasks is basically driven by
> the
> > >>> number
> > >>> >> of
> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> > >>> >> >>> >>
> > >>> >> >>> >> Thanks!
> > >>> >> >>> >>
> > >>> >> >>> >>
> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> > >>> >> behroz89@gmail.com>
> > >>> >> >>> >> wrote:
> > >>> >> >>> >> > Hi,
> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
> > machine
> > >>> >> setup.
> > >>> >> >>> I was
> > >>> >> >>> >> > successfully able to run my job that uses the HDFS to get
> > >>> data. I
> > >>> >> >>> have 3
> > >>> >> >>> >> > trivial questions
> > >>> >> >>> >> >
> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address
> > of
> > >>> >> server
> > >>> >> >>> >> running
> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick from
> the
> > >>> >> >>> configurations
> > >>> >> >>> >> > but it does not. I am probably doing something wrong.
> Right
> > >>> now my
> > >>> >> >>> code
> > >>> >> >>> >> work
> > >>> >> >>> >> > by using the following.
> > >>> >> >>> >> >
> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> > >>> URI("hdfs://server_ip:port/"),
> > >>> >> >>> conf);
> > >>> >> >>> >> >
> > >>> >> >>> >> > 2- On my master server, when I start hama it
> automatically
> > >>> starts
> > >>> >> >>> hama in
> > >>> >> >>> >> > the slave machine (all good). Both master and slave are
> set
> > >>> as
> > >>> >> >>> >> groomservers.
> > >>> >> >>> >> > This means that I have 2 servers to run my job which
> means
> > >>> that I
> > >>> >> can
> > >>> >> >>> >> open
> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with
> 3
> > >>> bsp
> > >>> >> tasks
> > >>> >> >>> then
> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
> > >>> freezes.
> > >>> >> >>> Here is
> > >>> >> >>> >> the
> > >>> >> >>> >> > result of JPS command on slave.
> > >>> >> >>> >> >
> > >>> >> >>> >> >
> > >>> >> >>> >> > Result of JPS command on Master
> > >>> >> >>> >> >
> > >>> >> >>> >> >
> > >>> >> >>> >> >
> > >>> >> >>> >> > You can see that it is only opening tasks on slaves but
> not
> > >>> on
> > >>> >> >>> master.
> > >>> >> >>> >> >
> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
> > >>> >> >>> >> hama-default.xml
> > >>> >> >>> >> > to 4 but still same result.
> > >>> >> >>> >> >
> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild
> processes
> > >>> as
> > >>> >> >>> possible.
> > >>> >> >>> >> Is
> > >>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
> > >>> picks up
> > >>> >> >>> the
> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> > >>> >> >>> >> >
> > >>> >> >>> >> >
> > >>> >> >>> >> > Regards,
> > >>> >> >>> >> >
> > >>> >> >>> >> > Behroz Sikander
> > >>> >> >>> >>
> > >>> >> >>> >>
> > >>> >> >>> >>
> > >>> >> >>> >> --
> > >>> >> >>> >> Best Regards, Edward J. Yoon
> > >>> >> >>> >>
> > >>> >> >>>
> > >>> >> >>>
> > >>> >> >>>
> > >>> >> >>> --
> > >>> >> >>> Best Regards, Edward J. Yoon
> > >>> >> >>>
> > >>> >> >>
> > >>> >> >>
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> --
> > >>> >> Best Regards, Edward J. Yoon
> > >>> >>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards, Edward J. Yoon
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
>
>
>

RE: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@samsung.com>.
> (0.7.0) because I do not understand YARN yet. It adds extra configurations

Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn 
configuration is only needed when you want to submit a BSP job to Yarn cluster 
without Hama cluster. So you don't need to worry about it. :-)

> distributed mode ? and is there any way to manage the server ? I mean right
> now, I have 3 machines with alot of configurations files and log files. It

You can use web UI at http://masterserver_address:40013/bspmaster.jsp

To debug your program, please try like below:

1) Run a BSPMaster and Zookeeper at server1.
% bin/hama bspmaster
% bin/hama zookeeper

2) Run a Groom at server1 and server2.

% bin/hama groom

3) Check whether deamons are running well. Then, run your program using jar 
command at server1.

% bin/hama jar .....

> In hama_[user]_bspmaster_.....log file I get the following exception. But
> this occurs in both cases when I run my job with 3 tasks or with 4 tasks

In fact, you should not see above initZK error log.

--
Best Regards, Edward J. Yoon


-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Monday, June 29, 2015 8:18 AM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

I will try the things that you mentioned. I am not using the latest version
(0.7.0) because I do not understand YARN yet. It adds extra configurations
which makes it more harder for me to understand when things go wrong. Any
suggestions ?

Further, are there any tools that you use for debugging while in
distributed mode ? and is there any way to manage the server ? I mean right
now, I have 3 machines with alot of configurations files and log files. It
takes alot of time. This makes me wonder how people who have 100s of
machines debug and manage the cluster.

Regards,
Behroz

On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Hi,
>
> It looks like a zookeeper connection problem. Please check whether
> zookeeper
> is running and every tasks can connect to zookeeper.
>
> I would recommend you to stop the firewall during debugging, and please use
> the 0.7.0 latest release.
>
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Monday, June 29, 2015 7:34 AM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> To figure out the issue, I was trying something else and found out another
> wiered issue. Might be a bug of Hama but I am not sure. Both following
> lines give an exception.
>
> System.out.println( peer.getPeerName(0)); //Exception
>
> System.out.println( peer.getNumPeers()); //Exception
>
>
> [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
>
> [time]java.lang.*RuntimeException: All peer names could not be retrieved!*
>
> at
>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>
> at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>
> at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>
> at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>
> at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>
> at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>
> at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>
> On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
> > I think I have more information on the issue. I did some debugging and
> > found something quite strange.
> >
> > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> > will be opened on other MACHINE2),
> >
> >  -  3 tasks on Machine1 are frozen and the strange thing is that the
> > processes do not even enter the SETUP function of BSP class. I have print
> > statements in the setup function of BSP class and it doesn't print
> > anything. I get empty files with zero size.
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.log
> >
> > - On MACHINE2, the code enters the SETUP function of BSP class and prints
> > stuff. See the size of files generated on output. How is it possible that
> > in 3 tasks the code can enter BSP and in others it cannot ?
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.err
> > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.log
> >
> > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000001_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000002_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000000_0' has started.
> >
> > - Hama Groom log file on MACHINE2 shows
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000003_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000004_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000005_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000004_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000003_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000005_0 is *done*.
> >
> > Any clue what might be going wrong ?
> >
> > Regards,
> > Behroz
> >
> >
> >
> > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> > wrote:
> >
> >> Here is the log file from that folder
> >>
> >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
> >> 61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> >> address:b178b33b16cc port:61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
> >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
> exiting
> >>
> >>
> >> And my console shows the following ouptut. Hama is frozen right now.
> >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> job_201506262331_0003
> >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
> >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
> >>
> >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
> >> wrote:
> >>
> >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
> >>>
> >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
> >>> wrote:
> >>> > Yea. I also thought that. I ran the program through eclipse with 20
> >>> tasks
> >>> > and it works fine.
> >>> >
> >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >>
> >>> >> It looks like a program bug. Have you ran your program in local
> mode?
> >>> >>
> >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
> >>> resolved
> >>> >> but
> >>> >> > issue number 2 is still giving me headaches.
> >>> >> >
> >>> >> > My problem:
> >>> >> > My cluster now consists of 3 machines. Each one of them properly
> >>> >> configured
> >>> >> > (Apparently). From my master machine when I start Hadoop and Hama,
> >>> I can
> >>> >> > see the processes started on other 2 machines. If I check the
> >>> maximum
> >>> >> tasks
> >>> >> > that my cluster can support then I get 9 (3 tasks on each
> machine).
> >>> >> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >> >
> >>> >> > I checked the logs files and things look fine. I just sometimes
> get
> >>> an
> >>> >> > exception that hama was not able to delete the sytem directory
> >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >>> >> >
> >>> >> > Any help or clue would be great.
> >>> >> >
> >>> >> > Regards,
> >>> >> > Behroz Sikander
> >>> >> >
> >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Thank you :)
> >>> >> >>
> >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >> >
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Hi,
> >>> >> >>>
> >>> >> >>> You can get the maximum number of available tasks like following
> >>> code:
> >>> >> >>>
> >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> >>> >> >>>
> >>> >> >>>     // Set to maximum
> >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >> >>> wrote:
> >>> >> >>> > Hi,
> >>> >> >>> > 1) Thank you for this.
> >>> >> >>> > 2) Here are the images. I will look into the log files of PI
> >>> example
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on slave*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on Master*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> >> >>> >
> >>> >> >>> > 3) In my current case, I do not have any input submitted to
> the
> >>> job.
> >>> >> >>> During
> >>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking
> for
> >>> >> >>> something
> >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >>> >> >>> >
> >>> >> >>> > Regards,
> >>> >> >>> > Behroz
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> >>> >> edwardyoon@apache.org
> >>> >> >>> >
> >>> >> >>> > wrote:
> >>> >> >>> >
> >>> >> >>> >> Hello,
> >>> >> >>> >>
> >>> >> >>> >> 1) You can get the filesystem URI from a configuration using
> >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
> >>> fs.defaultFS
> >>> >> >>> >> property should be in hama-site.xml
> >>> >> >>> >>
> >>> >> >>> >>   <property>
> >>> >> >>> >>     <name>fs.defaultFS</name>
> >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> >> >>> >>     <description>
> >>> >> >>> >>       The name of the default file system. Either the literal
> >>> string
> >>> >> >>> >>       "local" or a host:port for HDFS.
> >>> >> >>> >>     </description>
> >>> >> >>> >>   </property>
> >>> >> >>> >>
> >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node.
> It
> >>> looks
> >>> >> >>> >> cluster configuration issue. Please run Pi example and look
> at
> >>> the
> >>> >> >>> >> logs for more details. NOTE: you can not attach the images to
> >>> >> mailing
> >>> >> >>> >> list so I can't see it.
> >>> >> >>> >>
> >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input
> >>> is
> >>> >> >>> >> provided, the number of BSP tasks is basically driven by the
> >>> number
> >>> >> of
> >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> >>> >> >>> >>
> >>> >> >>> >> Thanks!
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> >>> >> behroz89@gmail.com>
> >>> >> >>> >> wrote:
> >>> >> >>> >> > Hi,
> >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
> machine
> >>> >> setup.
> >>> >> >>> I was
> >>> >> >>> >> > successfully able to run my job that uses the HDFS to get
> >>> data. I
> >>> >> >>> have 3
> >>> >> >>> >> > trivial questions
> >>> >> >>> >> >
> >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address
> of
> >>> >> server
> >>> >> >>> >> running
> >>> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
> >>> >> >>> configurations
> >>> >> >>> >> > but it does not. I am probably doing something wrong. Right
> >>> now my
> >>> >> >>> code
> >>> >> >>> >> work
> >>> >> >>> >> > by using the following.
> >>> >> >>> >> >
> >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >>> URI("hdfs://server_ip:port/"),
> >>> >> >>> conf);
> >>> >> >>> >> >
> >>> >> >>> >> > 2- On my master server, when I start hama it automatically
> >>> starts
> >>> >> >>> hama in
> >>> >> >>> >> > the slave machine (all good). Both master and slave are set
> >>> as
> >>> >> >>> >> groomservers.
> >>> >> >>> >> > This means that I have 2 servers to run my job which means
> >>> that I
> >>> >> can
> >>> >> >>> >> open
> >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3
> >>> bsp
> >>> >> tasks
> >>> >> >>> then
> >>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
> >>> freezes.
> >>> >> >>> Here is
> >>> >> >>> >> the
> >>> >> >>> >> > result of JPS command on slave.
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Result of JPS command on Master
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > You can see that it is only opening tasks on slaves but not
> >>> on
> >>> >> >>> master.
> >>> >> >>> >> >
> >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
> >>> >> >>> >> hama-default.xml
> >>> >> >>> >> > to 4 but still same result.
> >>> >> >>> >> >
> >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes
> >>> as
> >>> >> >>> possible.
> >>> >> >>> >> Is
> >>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
> >>> picks up
> >>> >> >>> the
> >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Regards,
> >>> >> >>> >> >
> >>> >> >>> >> > Behroz Sikander
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> --
> >>> >> >>> >> Best Regards, Edward J. Yoon
> >>> >> >>> >>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> Best Regards, Edward J. Yoon
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>
> >>
> >
>
>
>



Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
I will try the things that you mentioned. I am not using the latest version
(0.7.0) because I do not understand YARN yet. It adds extra configurations
which makes it more harder for me to understand when things go wrong. Any
suggestions ?

Further, are there any tools that you use for debugging while in
distributed mode ? and is there any way to manage the server ? I mean right
now, I have 3 machines with alot of configurations files and log files. It
takes alot of time. This makes me wonder how people who have 100s of
machines debug and manage the cluster.

Regards,
Behroz

On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <ed...@samsung.com>
wrote:

> Hi,
>
> It looks like a zookeeper connection problem. Please check whether
> zookeeper
> is running and every tasks can connect to zookeeper.
>
> I would recommend you to stop the firewall during debugging, and please use
> the 0.7.0 latest release.
>
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Monday, June 29, 2015 7:34 AM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> To figure out the issue, I was trying something else and found out another
> wiered issue. Might be a bug of Hama but I am not sure. Both following
> lines give an exception.
>
> System.out.println( peer.getPeerName(0)); //Exception
>
> System.out.println( peer.getNumPeers()); //Exception
>
>
> [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
>
> [time]java.lang.*RuntimeException: All peer names could not be retrieved!*
>
> at
>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>
> at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>
> at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>
> at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>
> at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>
> at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>
> at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>
> On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
> > I think I have more information on the issue. I did some debugging and
> > found something quite strange.
> >
> > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> > will be opened on other MACHINE2),
> >
> >  -  3 tasks on Machine1 are frozen and the strange thing is that the
> > processes do not even enter the SETUP function of BSP class. I have print
> > statements in the setup function of BSP class and it doesn't print
> > anything. I get empty files with zero size.
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.log
> >
> > - On MACHINE2, the code enters the SETUP function of BSP class and prints
> > stuff. See the size of files generated on output. How is it possible that
> > in 3 tasks the code can enter BSP and in others it cannot ?
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.err
> > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.log
> >
> > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000001_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000002_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000000_0' has started.
> >
> > - Hama Groom log file on MACHINE2 shows
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000003_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000004_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000005_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000004_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000003_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000005_0 is *done*.
> >
> > Any clue what might be going wrong ?
> >
> > Regards,
> > Behroz
> >
> >
> >
> > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> > wrote:
> >
> >> Here is the log file from that folder
> >>
> >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
> >> 61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> >> address:b178b33b16cc port:61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
> >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
> exiting
> >>
> >>
> >> And my console shows the following ouptut. Hama is frozen right now.
> >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> job_201506262331_0003
> >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
> >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
> >>
> >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
> >> wrote:
> >>
> >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
> >>>
> >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
> >>> wrote:
> >>> > Yea. I also thought that. I ran the program through eclipse with 20
> >>> tasks
> >>> > and it works fine.
> >>> >
> >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >>
> >>> >> It looks like a program bug. Have you ran your program in local
> mode?
> >>> >>
> >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
> >>> resolved
> >>> >> but
> >>> >> > issue number 2 is still giving me headaches.
> >>> >> >
> >>> >> > My problem:
> >>> >> > My cluster now consists of 3 machines. Each one of them properly
> >>> >> configured
> >>> >> > (Apparently). From my master machine when I start Hadoop and Hama,
> >>> I can
> >>> >> > see the processes started on other 2 machines. If I check the
> >>> maximum
> >>> >> tasks
> >>> >> > that my cluster can support then I get 9 (3 tasks on each
> machine).
> >>> >> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >> >
> >>> >> > I checked the logs files and things look fine. I just sometimes
> get
> >>> an
> >>> >> > exception that hama was not able to delete the sytem directory
> >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >>> >> >
> >>> >> > Any help or clue would be great.
> >>> >> >
> >>> >> > Regards,
> >>> >> > Behroz Sikander
> >>> >> >
> >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Thank you :)
> >>> >> >>
> >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >> >
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Hi,
> >>> >> >>>
> >>> >> >>> You can get the maximum number of available tasks like following
> >>> code:
> >>> >> >>>
> >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> >>> >> >>>
> >>> >> >>>     // Set to maximum
> >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >> >>> wrote:
> >>> >> >>> > Hi,
> >>> >> >>> > 1) Thank you for this.
> >>> >> >>> > 2) Here are the images. I will look into the log files of PI
> >>> example
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on slave*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on Master*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> >> >>> >
> >>> >> >>> > 3) In my current case, I do not have any input submitted to
> the
> >>> job.
> >>> >> >>> During
> >>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking
> for
> >>> >> >>> something
> >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >>> >> >>> >
> >>> >> >>> > Regards,
> >>> >> >>> > Behroz
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> >>> >> edwardyoon@apache.org
> >>> >> >>> >
> >>> >> >>> > wrote:
> >>> >> >>> >
> >>> >> >>> >> Hello,
> >>> >> >>> >>
> >>> >> >>> >> 1) You can get the filesystem URI from a configuration using
> >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
> >>> fs.defaultFS
> >>> >> >>> >> property should be in hama-site.xml
> >>> >> >>> >>
> >>> >> >>> >>   <property>
> >>> >> >>> >>     <name>fs.defaultFS</name>
> >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> >> >>> >>     <description>
> >>> >> >>> >>       The name of the default file system. Either the literal
> >>> string
> >>> >> >>> >>       "local" or a host:port for HDFS.
> >>> >> >>> >>     </description>
> >>> >> >>> >>   </property>
> >>> >> >>> >>
> >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node.
> It
> >>> looks
> >>> >> >>> >> cluster configuration issue. Please run Pi example and look
> at
> >>> the
> >>> >> >>> >> logs for more details. NOTE: you can not attach the images to
> >>> >> mailing
> >>> >> >>> >> list so I can't see it.
> >>> >> >>> >>
> >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input
> >>> is
> >>> >> >>> >> provided, the number of BSP tasks is basically driven by the
> >>> number
> >>> >> of
> >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> >>> >> >>> >>
> >>> >> >>> >> Thanks!
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> >>> >> behroz89@gmail.com>
> >>> >> >>> >> wrote:
> >>> >> >>> >> > Hi,
> >>> >> >>> >> > Recently, I moved from a single machine setup to a 2
> machine
> >>> >> setup.
> >>> >> >>> I was
> >>> >> >>> >> > successfully able to run my job that uses the HDFS to get
> >>> data. I
> >>> >> >>> have 3
> >>> >> >>> >> > trivial questions
> >>> >> >>> >> >
> >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address
> of
> >>> >> server
> >>> >> >>> >> running
> >>> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
> >>> >> >>> configurations
> >>> >> >>> >> > but it does not. I am probably doing something wrong. Right
> >>> now my
> >>> >> >>> code
> >>> >> >>> >> work
> >>> >> >>> >> > by using the following.
> >>> >> >>> >> >
> >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >>> URI("hdfs://server_ip:port/"),
> >>> >> >>> conf);
> >>> >> >>> >> >
> >>> >> >>> >> > 2- On my master server, when I start hama it automatically
> >>> starts
> >>> >> >>> hama in
> >>> >> >>> >> > the slave machine (all good). Both master and slave are set
> >>> as
> >>> >> >>> >> groomservers.
> >>> >> >>> >> > This means that I have 2 servers to run my job which means
> >>> that I
> >>> >> can
> >>> >> >>> >> open
> >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3
> >>> bsp
> >>> >> tasks
> >>> >> >>> then
> >>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
> >>> freezes.
> >>> >> >>> Here is
> >>> >> >>> >> the
> >>> >> >>> >> > result of JPS command on slave.
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Result of JPS command on Master
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > You can see that it is only opening tasks on slaves but not
> >>> on
> >>> >> >>> master.
> >>> >> >>> >> >
> >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
> >>> >> >>> >> hama-default.xml
> >>> >> >>> >> > to 4 but still same result.
> >>> >> >>> >> >
> >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes
> >>> as
> >>> >> >>> possible.
> >>> >> >>> >> Is
> >>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
> >>> picks up
> >>> >> >>> the
> >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Regards,
> >>> >> >>> >> >
> >>> >> >>> >> > Behroz Sikander
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> --
> >>> >> >>> >> Best Regards, Edward J. Yoon
> >>> >> >>> >>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> Best Regards, Edward J. Yoon
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>
> >>
> >
>
>
>

RE: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@samsung.com>.
Hi,

It looks like a zookeeper connection problem. Please check whether zookeeper 
is running and every tasks can connect to zookeeper.

I would recommend you to stop the firewall during debugging, and please use 
the 0.7.0 latest release.


--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Monday, June 29, 2015 7:34 AM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

To figure out the issue, I was trying something else and found out another
wiered issue. Might be a bug of Hama but I am not sure. Both following
lines give an exception.

System.out.println( peer.getPeerName(0)); //Exception

System.out.println( peer.getNumPeers()); //Exception


[time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*

[time]java.lang.*RuntimeException: All peer names could not be retrieved!*

at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)

at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)

at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)

at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*

at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)

at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)

at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)

On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com> wrote:

> I think I have more information on the issue. I did some debugging and
> found something quite strange.
>
> If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> will be opened on other MACHINE2),
>
>  -  3 tasks on Machine1 are frozen and the strange thing is that the
> processes do not even enter the SETUP function of BSP class. I have print
> statements in the setup function of BSP class and it doesn't print
> anything. I get empty files with zero size.
>
> drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000000_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000000_0.log
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000001_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000001_0.log
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000002_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000002_0.log
>
> - On MACHINE2, the code enters the SETUP function of BSP class and prints
> stuff. See the size of files generated on output. How is it possible that
> in 3 tasks the code can enter BSP and in others it cannot ?
>
> drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000003_0.err
> -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> attempt_201506281639_0001_000003_0.log
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000004_0.err
> -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> attempt_201506281639_0001_000004_0.log
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000005_0.err
> -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> attempt_201506281639_0001_000005_0.log
>
> - Hama Groom log file on MACHINE2 (which is frozen) shows.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000001_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000002_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000000_0' has started.
>
> - Hama Groom log file on MACHINE2 shows
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000003_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000004_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000005_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000004_0 is *done*.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000003_0 is *done*.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000005_0 is *done*.
>
> Any clue what might be going wrong ?
>
> Regards,
> Behroz
>
>
>
> On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
>> Here is the log file from that folder
>>
>> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
>> 61001
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001: starting
>> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>> address:b178b33b16cc port:61001
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001: starting
>> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
>> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001: exiting
>>
>>
>> And my console shows the following ouptut. Hama is frozen right now.
>> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> job_201506262331_0003
>> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
>> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
>>
>> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>>>
>>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
>>> wrote:
>>> > Yea. I also thought that. I ran the program through eclipse with 20
>>> tasks
>>> > and it works fine.
>>> >
>>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> > wrote:
>>> >
>>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> run my
>>> >> > program with 3 tasks, everything runs fine. But when I increase the
>>> tasks
>>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>>> what
>>> >> can
>>> >> > go wrong.
>>> >>
>>> >> It looks like a program bug. Have you ran your program in local mode?
>>> >>
>>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
>>> resolved
>>> >> but
>>> >> > issue number 2 is still giving me headaches.
>>> >> >
>>> >> > My problem:
>>> >> > My cluster now consists of 3 machines. Each one of them properly
>>> >> configured
>>> >> > (Apparently). From my master machine when I start Hadoop and Hama,
>>> I can
>>> >> > see the processes started on other 2 machines. If I check the
>>> maximum
>>> >> tasks
>>> >> > that my cluster can support then I get 9 (3 tasks on each machine).
>>> >> >
>>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> run my
>>> >> > program with 3 tasks, everything runs fine. But when I increase the
>>> tasks
>>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>>> what
>>> >> can
>>> >> > go wrong.
>>> >> >
>>> >> > I checked the logs files and things look fine. I just sometimes get
>>> an
>>> >> > exception that hama was not able to delete the sytem directory
>>> >> > (bsp.system.dir) defined in the hama-site.xml.
>>> >> >
>>> >> > Any help or clue would be great.
>>> >> >
>>> >> > Regards,
>>> >> > Behroz Sikander
>>> >> >
>>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> >> Thank you :)
>>> >> >>
>>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>>> edwardyoon@apache.org
>>> >> >
>>> >> >> wrote:
>>> >> >>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> You can get the maximum number of available tasks like following
>>> code:
>>> >> >>>
>>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>> >> >>>
>>> >> >>>     // Set to maximum
>>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > Hi,
>>> >> >>> > 1) Thank you for this.
>>> >> >>> > 2) Here are the images. I will look into the log files of PI
>>> example
>>> >> >>> >
>>> >> >>> > *Result of JPS command on slave*
>>> >> >>> >
>>> >> >>>
>>> >>
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >> >>> >
>>> >> >>> > *Result of JPS command on Master*
>>> >> >>> >
>>> >> >>>
>>> >>
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >> >>> >
>>> >> >>> > 3) In my current case, I do not have any input submitted to the
>>> job.
>>> >> >>> During
>>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking for
>>> >> >>> something
>>> >> >>> > like BSPJob.set*Max*NumBspTask().
>>> >> >>> >
>>> >> >>> > Regards,
>>> >> >>> > Behroz
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>>> >> edwardyoon@apache.org
>>> >> >>> >
>>> >> >>> > wrote:
>>> >> >>> >
>>> >> >>> >> Hello,
>>> >> >>> >>
>>> >> >>> >> 1) You can get the filesystem URI from a configuration using
>>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
>>> fs.defaultFS
>>> >> >>> >> property should be in hama-site.xml
>>> >> >>> >>
>>> >> >>> >>   <property>
>>> >> >>> >>     <name>fs.defaultFS</name>
>>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >> >>> >>     <description>
>>> >> >>> >>       The name of the default file system. Either the literal
>>> string
>>> >> >>> >>       "local" or a host:port for HDFS.
>>> >> >>> >>     </description>
>>> >> >>> >>   </property>
>>> >> >>> >>
>>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It
>>> looks
>>> >> >>> >> cluster configuration issue. Please run Pi example and look at
>>> the
>>> >> >>> >> logs for more details. NOTE: you can not attach the images to
>>> >> mailing
>>> >> >>> >> list so I can't see it.
>>> >> >>> >>
>>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input
>>> is
>>> >> >>> >> provided, the number of BSP tasks is basically driven by the
>>> number
>>> >> of
>>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>> >> >>> >>
>>> >> >>> >> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>>> >> behroz89@gmail.com>
>>> >> >>> >> wrote:
>>> >> >>> >> > Hi,
>>> >> >>> >> > Recently, I moved from a single machine setup to a 2 machine
>>> >> setup.
>>> >> >>> I was
>>> >> >>> >> > successfully able to run my job that uses the HDFS to get
>>> data. I
>>> >> >>> have 3
>>> >> >>> >> > trivial questions
>>> >> >>> >> >
>>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address of
>>> >> server
>>> >> >>> >> running
>>> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
>>> >> >>> configurations
>>> >> >>> >> > but it does not. I am probably doing something wrong. Right
>>> now my
>>> >> >>> code
>>> >> >>> >> work
>>> >> >>> >> > by using the following.
>>> >> >>> >> >
>>> >> >>> >> > FileSystem fs = FileSystem.get(new
>>> URI("hdfs://server_ip:port/"),
>>> >> >>> conf);
>>> >> >>> >> >
>>> >> >>> >> > 2- On my master server, when I start hama it automatically
>>> starts
>>> >> >>> hama in
>>> >> >>> >> > the slave machine (all good). Both master and slave are set
>>> as
>>> >> >>> >> groomservers.
>>> >> >>> >> > This means that I have 2 servers to run my job which means
>>> that I
>>> >> can
>>> >> >>> >> open
>>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3
>>> bsp
>>> >> tasks
>>> >> >>> then
>>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
>>> freezes.
>>> >> >>> Here is
>>> >> >>> >> the
>>> >> >>> >> > result of JPS command on slave.
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > Result of JPS command on Master
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > You can see that it is only opening tasks on slaves but not
>>> on
>>> >> >>> master.
>>> >> >>> >> >
>>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>>> >> >>> >> hama-default.xml
>>> >> >>> >> > to 4 but still same result.
>>> >> >>> >> >
>>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes
>>> as
>>> >> >>> possible.
>>> >> >>> >> Is
>>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
>>> picks up
>>> >> >>> the
>>> >> >>> >> > values from hama-default.xml to open tasks ?
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > Regards,
>>> >> >>> >> >
>>> >> >>> >> > Behroz Sikander
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Best Regards, Edward J. Yoon
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Best Regards, Edward J. Yoon
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>
>



Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
To figure out the issue, I was trying something else and found out another
wiered issue. Might be a bug of Hama but I am not sure. Both following
lines give an exception.

System.out.println( peer.getPeerName(0)); //Exception

System.out.println( peer.getNumPeers()); //Exception


[time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*

[time]java.lang.*RuntimeException: All peer names could not be retrieved!*

at
org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)

at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)

at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)

at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*

at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)

at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)

at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)

On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <be...@gmail.com> wrote:

> I think I have more information on the issue. I did some debugging and
> found something quite strange.
>
> If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> will be opened on other MACHINE2),
>
>  -  3 tasks on Machine1 are frozen and the strange thing is that the
> processes do not even enter the SETUP function of BSP class. I have print
> statements in the setup function of BSP class and it doesn't print
> anything. I get empty files with zero size.
>
> drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000000_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000000_0.log
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000001_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000001_0.log
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000002_0.err
> -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> attempt_201506281624_0001_000002_0.log
>
> - On MACHINE2, the code enters the SETUP function of BSP class and prints
> stuff. See the size of files generated on output. How is it possible that
> in 3 tasks the code can enter BSP and in others it cannot ?
>
> drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000003_0.err
> -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> attempt_201506281639_0001_000003_0.log
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000004_0.err
> -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> attempt_201506281639_0001_000004_0.log
> -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> attempt_201506281639_0001_000005_0.err
> -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> attempt_201506281639_0001_000005_0.log
>
> - Hama Groom log file on MACHINE2 (which is frozen) shows.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000001_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000002_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000000_0' has started.
>
> - Hama Groom log file on MACHINE2 shows
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000003_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000004_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> 'attempt_201506281639_0001_000005_0' has started.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000004_0 is *done*.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000003_0 is *done*.
> [time] INFO org.apache.hama.bsp.GroomServer: Task
> attempt_201506281639_0001_000005_0 is *done*.
>
> Any clue what might be going wrong ?
>
> Regards,
> Behroz
>
>
>
> On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
>> Here is the log file from that folder
>>
>> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
>> 61001
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001: starting
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001: starting
>> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
>> address:b178b33b16cc port:61001
>> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001: starting
>> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
>> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001: exiting
>> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001: exiting
>>
>>
>> And my console shows the following ouptut. Hama is frozen right now.
>> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>> job_201506262331_0003
>> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
>> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
>>
>> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>>>
>>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
>>> wrote:
>>> > Yea. I also thought that. I ran the program through eclipse with 20
>>> tasks
>>> > and it works fine.
>>> >
>>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> > wrote:
>>> >
>>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> run my
>>> >> > program with 3 tasks, everything runs fine. But when I increase the
>>> tasks
>>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>>> what
>>> >> can
>>> >> > go wrong.
>>> >>
>>> >> It looks like a program bug. Have you ran your program in local mode?
>>> >>
>>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
>>> resolved
>>> >> but
>>> >> > issue number 2 is still giving me headaches.
>>> >> >
>>> >> > My problem:
>>> >> > My cluster now consists of 3 machines. Each one of them properly
>>> >> configured
>>> >> > (Apparently). From my master machine when I start Hadoop and Hama,
>>> I can
>>> >> > see the processes started on other 2 machines. If I check the
>>> maximum
>>> >> tasks
>>> >> > that my cluster can support then I get 9 (3 tasks on each machine).
>>> >> >
>>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I
>>> run my
>>> >> > program with 3 tasks, everything runs fine. But when I increase the
>>> tasks
>>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>>> what
>>> >> can
>>> >> > go wrong.
>>> >> >
>>> >> > I checked the logs files and things look fine. I just sometimes get
>>> an
>>> >> > exception that hama was not able to delete the sytem directory
>>> >> > (bsp.system.dir) defined in the hama-site.xml.
>>> >> >
>>> >> > Any help or clue would be great.
>>> >> >
>>> >> > Regards,
>>> >> > Behroz Sikander
>>> >> >
>>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> >> Thank you :)
>>> >> >>
>>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>>> edwardyoon@apache.org
>>> >> >
>>> >> >> wrote:
>>> >> >>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> You can get the maximum number of available tasks like following
>>> code:
>>> >> >>>
>>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>> >> >>>
>>> >> >>>     // Set to maximum
>>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > Hi,
>>> >> >>> > 1) Thank you for this.
>>> >> >>> > 2) Here are the images. I will look into the log files of PI
>>> example
>>> >> >>> >
>>> >> >>> > *Result of JPS command on slave*
>>> >> >>> >
>>> >> >>>
>>> >>
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >> >>> >
>>> >> >>> > *Result of JPS command on Master*
>>> >> >>> >
>>> >> >>>
>>> >>
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >> >>> >
>>> >> >>> > 3) In my current case, I do not have any input submitted to the
>>> job.
>>> >> >>> During
>>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking for
>>> >> >>> something
>>> >> >>> > like BSPJob.set*Max*NumBspTask().
>>> >> >>> >
>>> >> >>> > Regards,
>>> >> >>> > Behroz
>>> >> >>> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>>> >> edwardyoon@apache.org
>>> >> >>> >
>>> >> >>> > wrote:
>>> >> >>> >
>>> >> >>> >> Hello,
>>> >> >>> >>
>>> >> >>> >> 1) You can get the filesystem URI from a configuration using
>>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
>>> fs.defaultFS
>>> >> >>> >> property should be in hama-site.xml
>>> >> >>> >>
>>> >> >>> >>   <property>
>>> >> >>> >>     <name>fs.defaultFS</name>
>>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >> >>> >>     <description>
>>> >> >>> >>       The name of the default file system. Either the literal
>>> string
>>> >> >>> >>       "local" or a host:port for HDFS.
>>> >> >>> >>     </description>
>>> >> >>> >>   </property>
>>> >> >>> >>
>>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It
>>> looks
>>> >> >>> >> cluster configuration issue. Please run Pi example and look at
>>> the
>>> >> >>> >> logs for more details. NOTE: you can not attach the images to
>>> >> mailing
>>> >> >>> >> list so I can't see it.
>>> >> >>> >>
>>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input
>>> is
>>> >> >>> >> provided, the number of BSP tasks is basically driven by the
>>> number
>>> >> of
>>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>> >> >>> >>
>>> >> >>> >> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>>> >> behroz89@gmail.com>
>>> >> >>> >> wrote:
>>> >> >>> >> > Hi,
>>> >> >>> >> > Recently, I moved from a single machine setup to a 2 machine
>>> >> setup.
>>> >> >>> I was
>>> >> >>> >> > successfully able to run my job that uses the HDFS to get
>>> data. I
>>> >> >>> have 3
>>> >> >>> >> > trivial questions
>>> >> >>> >> >
>>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address of
>>> >> server
>>> >> >>> >> running
>>> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
>>> >> >>> configurations
>>> >> >>> >> > but it does not. I am probably doing something wrong. Right
>>> now my
>>> >> >>> code
>>> >> >>> >> work
>>> >> >>> >> > by using the following.
>>> >> >>> >> >
>>> >> >>> >> > FileSystem fs = FileSystem.get(new
>>> URI("hdfs://server_ip:port/"),
>>> >> >>> conf);
>>> >> >>> >> >
>>> >> >>> >> > 2- On my master server, when I start hama it automatically
>>> starts
>>> >> >>> hama in
>>> >> >>> >> > the slave machine (all good). Both master and slave are set
>>> as
>>> >> >>> >> groomservers.
>>> >> >>> >> > This means that I have 2 servers to run my job which means
>>> that I
>>> >> can
>>> >> >>> >> open
>>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3
>>> bsp
>>> >> tasks
>>> >> >>> then
>>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
>>> freezes.
>>> >> >>> Here is
>>> >> >>> >> the
>>> >> >>> >> > result of JPS command on slave.
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > Result of JPS command on Master
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > You can see that it is only opening tasks on slaves but not
>>> on
>>> >> >>> master.
>>> >> >>> >> >
>>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>>> >> >>> >> hama-default.xml
>>> >> >>> >> > to 4 but still same result.
>>> >> >>> >> >
>>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes
>>> as
>>> >> >>> possible.
>>> >> >>> >> Is
>>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
>>> picks up
>>> >> >>> the
>>> >> >>> >> > values from hama-default.xml to open tasks ?
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >> > Regards,
>>> >> >>> >> >
>>> >> >>> >> > Behroz Sikander
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Best Regards, Edward J. Yoon
>>> >> >>> >>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Best Regards, Edward J. Yoon
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
I think I have more information on the issue. I did some debugging and
found something quite strange.

If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
will be opened on other MACHINE2),

 -  3 tasks on Machine1 are frozen and the strange thing is that the
processes do not even enter the SETUP function of BSP class. I have print
statements in the setup function of BSP class and it doesn't print
anything. I get empty files with zero size.

drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000000_0.err
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000000_0.log
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000001_0.err
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000001_0.log
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000002_0.err
-rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
attempt_201506281624_0001_000002_0.log

- On MACHINE2, the code enters the SETUP function of BSP class and prints
stuff. See the size of files generated on output. How is it possible that
in 3 tasks the code can enter BSP and in others it cannot ?

drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
-rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
attempt_201506281639_0001_000003_0.err
-rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
attempt_201506281639_0001_000003_0.log
-rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
attempt_201506281639_0001_000004_0.err
-rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
attempt_201506281639_0001_000004_0.log
-rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
attempt_201506281639_0001_000005_0.err
-rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
attempt_201506281639_0001_000005_0.log

- Hama Groom log file on MACHINE2 (which is frozen) shows.
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000001_0' has started.
[time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000002_0' has started.
[time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000000_0' has started.

- Hama Groom log file on MACHINE2 shows
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000003_0' has started.
[time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000004_0' has started.
[time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
[time] INFO org.apache.hama.bsp.GroomServer: Task
'attempt_201506281639_0001_000005_0' has started.
[time] INFO org.apache.hama.bsp.GroomServer: Task
attempt_201506281639_0001_000004_0 is *done*.
[time] INFO org.apache.hama.bsp.GroomServer: Task
attempt_201506281639_0001_000003_0 is *done*.
[time] INFO org.apache.hama.bsp.GroomServer: Task
attempt_201506281639_0001_000005_0 is *done*.

Any clue what might be going wrong ?

Regards,
Behroz



On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com> wrote:

> Here is the log file from that folder
>
> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port 61001
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001: starting
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001: starting
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001: starting
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001: starting
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001: starting
> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> address:b178b33b16cc port:61001
> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001: starting
> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001: exiting
> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001: exiting
> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001: exiting
> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001: exiting
> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001: exiting
>
>
> And my console shows the following ouptut. Hama is frozen right now.
> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: job_201506262331_0003
> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
>
> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>>
>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Yea. I also thought that. I ran the program through eclipse with 20
>> tasks
>> > and it works fine.
>> >
>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <ed...@apache.org>
>> > wrote:
>> >
>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I run
>> my
>> >> > program with 3 tasks, everything runs fine. But when I increase the
>> tasks
>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>> what
>> >> can
>> >> > go wrong.
>> >>
>> >> It looks like a program bug. Have you ran your program in local mode?
>> >>
>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
>> resolved
>> >> but
>> >> > issue number 2 is still giving me headaches.
>> >> >
>> >> > My problem:
>> >> > My cluster now consists of 3 machines. Each one of them properly
>> >> configured
>> >> > (Apparently). From my master machine when I start Hadoop and Hama, I
>> can
>> >> > see the processes started on other 2 machines. If I check the maximum
>> >> tasks
>> >> > that my cluster can support then I get 9 (3 tasks on each machine).
>> >> >
>> >> > When I run the PI example, it uses 9 tasks and runs fine. When I run
>> my
>> >> > program with 3 tasks, everything runs fine. But when I increase the
>> tasks
>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
>> what
>> >> can
>> >> > go wrong.
>> >> >
>> >> > I checked the logs files and things look fine. I just sometimes get
>> an
>> >> > exception that hama was not able to delete the sytem directory
>> >> > (bsp.system.dir) defined in the hama-site.xml.
>> >> >
>> >> > Any help or clue would be great.
>> >> >
>> >> > Regards,
>> >> > Behroz Sikander
>> >> >
>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <behroz89@gmail.com
>> >
>> >> wrote:
>> >> >
>> >> >> Thank you :)
>> >> >>
>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
>> edwardyoon@apache.org
>> >> >
>> >> >> wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> You can get the maximum number of available tasks like following
>> code:
>> >> >>>
>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>> >> >>>
>> >> >>>     // Set to maximum
>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >> >>> wrote:
>> >> >>> > Hi,
>> >> >>> > 1) Thank you for this.
>> >> >>> > 2) Here are the images. I will look into the log files of PI
>> example
>> >> >>> >
>> >> >>> > *Result of JPS command on slave*
>> >> >>> >
>> >> >>>
>> >>
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >> >>> >
>> >> >>> > *Result of JPS command on Master*
>> >> >>> >
>> >> >>>
>> >>
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >> >>> >
>> >> >>> > 3) In my current case, I do not have any input submitted to the
>> job.
>> >> >>> During
>> >> >>> > run time, I directly fetch data from HDFS. So, I am looking for
>> >> >>> something
>> >> >>> > like BSPJob.set*Max*NumBspTask().
>> >> >>> >
>> >> >>> > Regards,
>> >> >>> > Behroz
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> >> edwardyoon@apache.org
>> >> >>> >
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> Hello,
>> >> >>> >>
>> >> >>> >> 1) You can get the filesystem URI from a configuration using
>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
>> fs.defaultFS
>> >> >>> >> property should be in hama-site.xml
>> >> >>> >>
>> >> >>> >>   <property>
>> >> >>> >>     <name>fs.defaultFS</name>
>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>> >> >>> >>     <description>
>> >> >>> >>       The name of the default file system. Either the literal
>> string
>> >> >>> >>       "local" or a host:port for HDFS.
>> >> >>> >>     </description>
>> >> >>> >>   </property>
>> >> >>> >>
>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It
>> looks
>> >> >>> >> cluster configuration issue. Please run Pi example and look at
>> the
>> >> >>> >> logs for more details. NOTE: you can not attach the images to
>> >> mailing
>> >> >>> >> list so I can't see it.
>> >> >>> >>
>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>> >> >>> >> provided, the number of BSP tasks is basically driven by the
>> number
>> >> of
>> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>> >> >>> >>
>> >> >>> >> Thanks!
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> >> behroz89@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >> > Hi,
>> >> >>> >> > Recently, I moved from a single machine setup to a 2 machine
>> >> setup.
>> >> >>> I was
>> >> >>> >> > successfully able to run my job that uses the HDFS to get
>> data. I
>> >> >>> have 3
>> >> >>> >> > trivial questions
>> >> >>> >> >
>> >> >>> >> > 1- To access HDFS, I have to manually give the IP address of
>> >> server
>> >> >>> >> running
>> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
>> >> >>> configurations
>> >> >>> >> > but it does not. I am probably doing something wrong. Right
>> now my
>> >> >>> code
>> >> >>> >> work
>> >> >>> >> > by using the following.
>> >> >>> >> >
>> >> >>> >> > FileSystem fs = FileSystem.get(new
>> URI("hdfs://server_ip:port/"),
>> >> >>> conf);
>> >> >>> >> >
>> >> >>> >> > 2- On my master server, when I start hama it automatically
>> starts
>> >> >>> hama in
>> >> >>> >> > the slave machine (all good). Both master and slave are set as
>> >> >>> >> groomservers.
>> >> >>> >> > This means that I have 2 servers to run my job which means
>> that I
>> >> can
>> >> >>> >> open
>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp
>> >> tasks
>> >> >>> then
>> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
>> freezes.
>> >> >>> Here is
>> >> >>> >> the
>> >> >>> >> > result of JPS command on slave.
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Result of JPS command on Master
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > You can see that it is only opening tasks on slaves but not on
>> >> >>> master.
>> >> >>> >> >
>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>> >> >>> >> hama-default.xml
>> >> >>> >> > to 4 but still same result.
>> >> >>> >> >
>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
>> >> >>> possible.
>> >> >>> >> Is
>> >> >>> >> > there any setting that can I do to achieve that ? Or hama
>> picks up
>> >> >>> the
>> >> >>> >> > values from hama-default.xml to open tasks ?
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Regards,
>> >> >>> >> >
>> >> >>> >> > Behroz Sikander
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Best Regards, Edward J. Yoon
>> >> >>> >>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Best Regards, Edward J. Yoon
>> >> >>>
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Here is the log file from that folder

15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port 61001
15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001: starting
15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001: starting
15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001: starting
15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001: starting
15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001: starting
15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
address:b178b33b16cc port:61001
15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001: starting
15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
Zookeeper! At b178b33b16cc/172.17.0.7:61001
15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001: exiting
15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001: exiting
15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001: exiting
15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001: exiting
15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001: exiting


And my console shows the following ouptut. Hama is frozen right now.
15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: job_201506262331_0003
15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2

On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <ed...@apache.org>
wrote:

> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
>
> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com>
> wrote:
> > Yea. I also thought that. I ran the program through eclipse with 20 tasks
> > and it works fine.
> >
> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <ed...@apache.org>
> > wrote:
> >
> >> > When I run the PI example, it uses 9 tasks and runs fine. When I run
> my
> >> > program with 3 tasks, everything runs fine. But when I increase the
> tasks
> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> what
> >> can
> >> > go wrong.
> >>
> >> It looks like a program bug. Have you ran your program in local mode?
> >>
> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are
> resolved
> >> but
> >> > issue number 2 is still giving me headaches.
> >> >
> >> > My problem:
> >> > My cluster now consists of 3 machines. Each one of them properly
> >> configured
> >> > (Apparently). From my master machine when I start Hadoop and Hama, I
> can
> >> > see the processes started on other 2 machines. If I check the maximum
> >> tasks
> >> > that my cluster can support then I get 9 (3 tasks on each machine).
> >> >
> >> > When I run the PI example, it uses 9 tasks and runs fine. When I run
> my
> >> > program with 3 tasks, everything runs fine. But when I increase the
> tasks
> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> what
> >> can
> >> > go wrong.
> >> >
> >> > I checked the logs files and things look fine. I just sometimes get an
> >> > exception that hama was not able to delete the sytem directory
> >> > (bsp.system.dir) defined in the hama-site.xml.
> >> >
> >> > Any help or clue would be great.
> >> >
> >> > Regards,
> >> > Behroz Sikander
> >> >
> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> >
> >> >> Thank you :)
> >> >>
> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >> wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> You can get the maximum number of available tasks like following
> code:
> >> >>>
> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> >> >>>
> >> >>>     // Set to maximum
> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >> >>>
> >> >>>
> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> behroz89@gmail.com>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> > 1) Thank you for this.
> >> >>> > 2) Here are the images. I will look into the log files of PI
> example
> >> >>> >
> >> >>> > *Result of JPS command on slave*
> >> >>> >
> >> >>>
> >>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >> >>> >
> >> >>> > *Result of JPS command on Master*
> >> >>> >
> >> >>>
> >>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >> >>> >
> >> >>> > 3) In my current case, I do not have any input submitted to the
> job.
> >> >>> During
> >> >>> > run time, I directly fetch data from HDFS. So, I am looking for
> >> >>> something
> >> >>> > like BSPJob.set*Max*NumBspTask().
> >> >>> >
> >> >>> > Regards,
> >> >>> > Behroz
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> >> edwardyoon@apache.org
> >> >>> >
> >> >>> > wrote:
> >> >>> >
> >> >>> >> Hello,
> >> >>> >>
> >> >>> >> 1) You can get the filesystem URI from a configuration using
> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the
> fs.defaultFS
> >> >>> >> property should be in hama-site.xml
> >> >>> >>
> >> >>> >>   <property>
> >> >>> >>     <name>fs.defaultFS</name>
> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >> >>> >>     <description>
> >> >>> >>       The name of the default file system. Either the literal
> string
> >> >>> >>       "local" or a host:port for HDFS.
> >> >>> >>     </description>
> >> >>> >>   </property>
> >> >>> >>
> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It
> looks
> >> >>> >> cluster configuration issue. Please run Pi example and look at
> the
> >> >>> >> logs for more details. NOTE: you can not attach the images to
> >> mailing
> >> >>> >> list so I can't see it.
> >> >>> >>
> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
> >> >>> >> provided, the number of BSP tasks is basically driven by the
> number
> >> of
> >> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> >> >>> >>
> >> >>> >> Thanks!
> >> >>> >>
> >> >>> >>
> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> >> behroz89@gmail.com>
> >> >>> >> wrote:
> >> >>> >> > Hi,
> >> >>> >> > Recently, I moved from a single machine setup to a 2 machine
> >> setup.
> >> >>> I was
> >> >>> >> > successfully able to run my job that uses the HDFS to get
> data. I
> >> >>> have 3
> >> >>> >> > trivial questions
> >> >>> >> >
> >> >>> >> > 1- To access HDFS, I have to manually give the IP address of
> >> server
> >> >>> >> running
> >> >>> >> > HDFS. I thought that Hama will automatically pick from the
> >> >>> configurations
> >> >>> >> > but it does not. I am probably doing something wrong. Right
> now my
> >> >>> code
> >> >>> >> work
> >> >>> >> > by using the following.
> >> >>> >> >
> >> >>> >> > FileSystem fs = FileSystem.get(new
> URI("hdfs://server_ip:port/"),
> >> >>> conf);
> >> >>> >> >
> >> >>> >> > 2- On my master server, when I start hama it automatically
> starts
> >> >>> hama in
> >> >>> >> > the slave machine (all good). Both master and slave are set as
> >> >>> >> groomservers.
> >> >>> >> > This means that I have 2 servers to run my job which means
> that I
> >> can
> >> >>> >> open
> >> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp
> >> tasks
> >> >>> then
> >> >>> >> > everything works fine. But when I move to 4 tasks, Hama
> freezes.
> >> >>> Here is
> >> >>> >> the
> >> >>> >> > result of JPS command on slave.
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > Result of JPS command on Master
> >> >>> >> >
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > You can see that it is only opening tasks on slaves but not on
> >> >>> master.
> >> >>> >> >
> >> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
> >> >>> >> hama-default.xml
> >> >>> >> > to 4 but still same result.
> >> >>> >> >
> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
> >> >>> possible.
> >> >>> >> Is
> >> >>> >> > there any setting that can I do to achieve that ? Or hama
> picks up
> >> >>> the
> >> >>> >> > values from hama-default.xml to open tasks ?
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > Regards,
> >> >>> >> >
> >> >>> >> > Behroz Sikander
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >> Best Regards, Edward J. Yoon
> >> >>> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
Please check the task logs in $HAMA_HOME/logs/tasklogs folder.

On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <be...@gmail.com> wrote:
> Yea. I also thought that. I ran the program through eclipse with 20 tasks
> and it works fine.
>
> On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> > When I run the PI example, it uses 9 tasks and runs fine. When I run my
>> > program with 3 tasks, everything runs fine. But when I increase the tasks
>> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand what
>> can
>> > go wrong.
>>
>> It looks like a program bug. Have you ran your program in local mode?
>>
>> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Hi,
>> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are resolved
>> but
>> > issue number 2 is still giving me headaches.
>> >
>> > My problem:
>> > My cluster now consists of 3 machines. Each one of them properly
>> configured
>> > (Apparently). From my master machine when I start Hadoop and Hama, I can
>> > see the processes started on other 2 machines. If I check the maximum
>> tasks
>> > that my cluster can support then I get 9 (3 tasks on each machine).
>> >
>> > When I run the PI example, it uses 9 tasks and runs fine. When I run my
>> > program with 3 tasks, everything runs fine. But when I increase the tasks
>> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand what
>> can
>> > go wrong.
>> >
>> > I checked the logs files and things look fine. I just sometimes get an
>> > exception that hama was not able to delete the sytem directory
>> > (bsp.system.dir) defined in the hama-site.xml.
>> >
>> > Any help or clue would be great.
>> >
>> > Regards,
>> > Behroz Sikander
>> >
>> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> >
>> >> Thank you :)
>> >>
>> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <edwardyoon@apache.org
>> >
>> >> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> You can get the maximum number of available tasks like following code:
>> >>>
>> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
>> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>> >>>
>> >>>     // Set to maximum
>> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>> >>>
>> >>>
>> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
>> >>> wrote:
>> >>> > Hi,
>> >>> > 1) Thank you for this.
>> >>> > 2) Here are the images. I will look into the log files of PI example
>> >>> >
>> >>> > *Result of JPS command on slave*
>> >>> >
>> >>>
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >>> >
>> >>> > *Result of JPS command on Master*
>> >>> >
>> >>>
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >>> >
>> >>> > 3) In my current case, I do not have any input submitted to the job.
>> >>> During
>> >>> > run time, I directly fetch data from HDFS. So, I am looking for
>> >>> something
>> >>> > like BSPJob.set*Max*NumBspTask().
>> >>> >
>> >>> > Regards,
>> >>> > Behroz
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>> edwardyoon@apache.org
>> >>> >
>> >>> > wrote:
>> >>> >
>> >>> >> Hello,
>> >>> >>
>> >>> >> 1) You can get the filesystem URI from a configuration using
>> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
>> >>> >> property should be in hama-site.xml
>> >>> >>
>> >>> >>   <property>
>> >>> >>     <name>fs.defaultFS</name>
>> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>> >>> >>     <description>
>> >>> >>       The name of the default file system. Either the literal string
>> >>> >>       "local" or a host:port for HDFS.
>> >>> >>     </description>
>> >>> >>   </property>
>> >>> >>
>> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
>> >>> >> cluster configuration issue. Please run Pi example and look at the
>> >>> >> logs for more details. NOTE: you can not attach the images to
>> mailing
>> >>> >> list so I can't see it.
>> >>> >>
>> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>> >>> >> provided, the number of BSP tasks is basically driven by the number
>> of
>> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>> >>> >>
>> >>> >> Thanks!
>> >>> >>
>> >>> >>
>> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> > Recently, I moved from a single machine setup to a 2 machine
>> setup.
>> >>> I was
>> >>> >> > successfully able to run my job that uses the HDFS to get data. I
>> >>> have 3
>> >>> >> > trivial questions
>> >>> >> >
>> >>> >> > 1- To access HDFS, I have to manually give the IP address of
>> server
>> >>> >> running
>> >>> >> > HDFS. I thought that Hama will automatically pick from the
>> >>> configurations
>> >>> >> > but it does not. I am probably doing something wrong. Right now my
>> >>> code
>> >>> >> work
>> >>> >> > by using the following.
>> >>> >> >
>> >>> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
>> >>> conf);
>> >>> >> >
>> >>> >> > 2- On my master server, when I start hama it automatically starts
>> >>> hama in
>> >>> >> > the slave machine (all good). Both master and slave are set as
>> >>> >> groomservers.
>> >>> >> > This means that I have 2 servers to run my job which means that I
>> can
>> >>> >> open
>> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp
>> tasks
>> >>> then
>> >>> >> > everything works fine. But when I move to 4 tasks, Hama freezes.
>> >>> Here is
>> >>> >> the
>> >>> >> > result of JPS command on slave.
>> >>> >> >
>> >>> >> >
>> >>> >> > Result of JPS command on Master
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > You can see that it is only opening tasks on slaves but not on
>> >>> master.
>> >>> >> >
>> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
>> >>> >> hama-default.xml
>> >>> >> > to 4 but still same result.
>> >>> >> >
>> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
>> >>> possible.
>> >>> >> Is
>> >>> >> > there any setting that can I do to achieve that ? Or hama picks up
>> >>> the
>> >>> >> > values from hama-default.xml to open tasks ?
>> >>> >> >
>> >>> >> >
>> >>> >> > Regards,
>> >>> >> >
>> >>> >> > Behroz Sikander
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Best Regards, Edward J. Yoon
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards, Edward J. Yoon
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Yea. I also thought that. I ran the program through eclipse with 20 tasks
and it works fine.

On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <ed...@apache.org>
wrote:

> > When I run the PI example, it uses 9 tasks and runs fine. When I run my
> > program with 3 tasks, everything runs fine. But when I increase the tasks
> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand what
> can
> > go wrong.
>
> It looks like a program bug. Have you ran your program in local mode?
>
> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com>
> wrote:
> > Hi,
> > In the current thread, I mentioned 3 issues. Issue 1 and 3 are resolved
> but
> > issue number 2 is still giving me headaches.
> >
> > My problem:
> > My cluster now consists of 3 machines. Each one of them properly
> configured
> > (Apparently). From my master machine when I start Hadoop and Hama, I can
> > see the processes started on other 2 machines. If I check the maximum
> tasks
> > that my cluster can support then I get 9 (3 tasks on each machine).
> >
> > When I run the PI example, it uses 9 tasks and runs fine. When I run my
> > program with 3 tasks, everything runs fine. But when I increase the tasks
> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand what
> can
> > go wrong.
> >
> > I checked the logs files and things look fine. I just sometimes get an
> > exception that hama was not able to delete the sytem directory
> > (bsp.system.dir) defined in the hama-site.xml.
> >
> > Any help or clue would be great.
> >
> > Regards,
> > Behroz Sikander
> >
> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> wrote:
> >
> >> Thank you :)
> >>
> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <edwardyoon@apache.org
> >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> You can get the maximum number of available tasks like following code:
> >>>
> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> >>>
> >>>     // Set to maximum
> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>>
> >>>
> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> > 1) Thank you for this.
> >>> > 2) Here are the images. I will look into the log files of PI example
> >>> >
> >>> > *Result of JPS command on slave*
> >>> >
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> >
> >>> > *Result of JPS command on Master*
> >>> >
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> >
> >>> > 3) In my current case, I do not have any input submitted to the job.
> >>> During
> >>> > run time, I directly fetch data from HDFS. So, I am looking for
> >>> something
> >>> > like BSPJob.set*Max*NumBspTask().
> >>> >
> >>> > Regards,
> >>> > Behroz
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> >> Hello,
> >>> >>
> >>> >> 1) You can get the filesystem URI from a configuration using
> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
> >>> >> property should be in hama-site.xml
> >>> >>
> >>> >>   <property>
> >>> >>     <name>fs.defaultFS</name>
> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> >>     <description>
> >>> >>       The name of the default file system. Either the literal string
> >>> >>       "local" or a host:port for HDFS.
> >>> >>     </description>
> >>> >>   </property>
> >>> >>
> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
> >>> >> cluster configuration issue. Please run Pi example and look at the
> >>> >> logs for more details. NOTE: you can not attach the images to
> mailing
> >>> >> list so I can't see it.
> >>> >>
> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
> >>> >> provided, the number of BSP tasks is basically driven by the number
> of
> >>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> >>> >>
> >>> >> Thanks!
> >>> >>
> >>> >>
> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <
> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> > Recently, I moved from a single machine setup to a 2 machine
> setup.
> >>> I was
> >>> >> > successfully able to run my job that uses the HDFS to get data. I
> >>> have 3
> >>> >> > trivial questions
> >>> >> >
> >>> >> > 1- To access HDFS, I have to manually give the IP address of
> server
> >>> >> running
> >>> >> > HDFS. I thought that Hama will automatically pick from the
> >>> configurations
> >>> >> > but it does not. I am probably doing something wrong. Right now my
> >>> code
> >>> >> work
> >>> >> > by using the following.
> >>> >> >
> >>> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
> >>> conf);
> >>> >> >
> >>> >> > 2- On my master server, when I start hama it automatically starts
> >>> hama in
> >>> >> > the slave machine (all good). Both master and slave are set as
> >>> >> groomservers.
> >>> >> > This means that I have 2 servers to run my job which means that I
> can
> >>> >> open
> >>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp
> tasks
> >>> then
> >>> >> > everything works fine. But when I move to 4 tasks, Hama freezes.
> >>> Here is
> >>> >> the
> >>> >> > result of JPS command on slave.
> >>> >> >
> >>> >> >
> >>> >> > Result of JPS command on Master
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > You can see that it is only opening tasks on slaves but not on
> >>> master.
> >>> >> >
> >>> >> > Note: I tried to change the bsp.tasks.maximum property in
> >>> >> hama-default.xml
> >>> >> > to 4 but still same result.
> >>> >> >
> >>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
> >>> possible.
> >>> >> Is
> >>> >> > there any setting that can I do to achieve that ? Or hama picks up
> >>> the
> >>> >> > values from hama-default.xml to open tasks ?
> >>> >> >
> >>> >> >
> >>> >> > Regards,
> >>> >> >
> >>> >> > Behroz Sikander
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
> When I run the PI example, it uses 9 tasks and runs fine. When I run my
> program with 3 tasks, everything runs fine. But when I increase the tasks
> (to 4) by using "setNumBspTask". Hama freezes. I do not understand what can
> go wrong.

It looks like a program bug. Have you ran your program in local mode?

On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <be...@gmail.com> wrote:
> Hi,
> In the current thread, I mentioned 3 issues. Issue 1 and 3 are resolved but
> issue number 2 is still giving me headaches.
>
> My problem:
> My cluster now consists of 3 machines. Each one of them properly configured
> (Apparently). From my master machine when I start Hadoop and Hama, I can
> see the processes started on other 2 machines. If I check the maximum tasks
> that my cluster can support then I get 9 (3 tasks on each machine).
>
> When I run the PI example, it uses 9 tasks and runs fine. When I run my
> program with 3 tasks, everything runs fine. But when I increase the tasks
> (to 4) by using "setNumBspTask". Hama freezes. I do not understand what can
> go wrong.
>
> I checked the logs files and things look fine. I just sometimes get an
> exception that hama was not able to delete the sytem directory
> (bsp.system.dir) defined in the hama-site.xml.
>
> Any help or clue would be great.
>
> Regards,
> Behroz Sikander
>
> On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com> wrote:
>
>> Thank you :)
>>
>> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> You can get the maximum number of available tasks like following code:
>>>
>>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>>
>>>     // Set to maximum
>>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>>
>>>
>>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
>>> wrote:
>>> > Hi,
>>> > 1) Thank you for this.
>>> > 2) Here are the images. I will look into the log files of PI example
>>> >
>>> > *Result of JPS command on slave*
>>> >
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >
>>> > *Result of JPS command on Master*
>>> >
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >
>>> > 3) In my current case, I do not have any input submitted to the job.
>>> During
>>> > run time, I directly fetch data from HDFS. So, I am looking for
>>> something
>>> > like BSPJob.set*Max*NumBspTask().
>>> >
>>> > Regards,
>>> > Behroz
>>> >
>>> >
>>> >
>>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >
>>> > wrote:
>>> >
>>> >> Hello,
>>> >>
>>> >> 1) You can get the filesystem URI from a configuration using
>>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
>>> >> property should be in hama-site.xml
>>> >>
>>> >>   <property>
>>> >>     <name>fs.defaultFS</name>
>>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >>     <description>
>>> >>       The name of the default file system. Either the literal string
>>> >>       "local" or a host:port for HDFS.
>>> >>     </description>
>>> >>   </property>
>>> >>
>>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
>>> >> cluster configuration issue. Please run Pi example and look at the
>>> >> logs for more details. NOTE: you can not attach the images to mailing
>>> >> list so I can't see it.
>>> >>
>>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>>> >> provided, the number of BSP tasks is basically driven by the number of
>>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>> >>
>>> >> Thanks!
>>> >>
>>> >>
>>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> > Recently, I moved from a single machine setup to a 2 machine setup.
>>> I was
>>> >> > successfully able to run my job that uses the HDFS to get data. I
>>> have 3
>>> >> > trivial questions
>>> >> >
>>> >> > 1- To access HDFS, I have to manually give the IP address of server
>>> >> running
>>> >> > HDFS. I thought that Hama will automatically pick from the
>>> configurations
>>> >> > but it does not. I am probably doing something wrong. Right now my
>>> code
>>> >> work
>>> >> > by using the following.
>>> >> >
>>> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
>>> conf);
>>> >> >
>>> >> > 2- On my master server, when I start hama it automatically starts
>>> hama in
>>> >> > the slave machine (all good). Both master and slave are set as
>>> >> groomservers.
>>> >> > This means that I have 2 servers to run my job which means that I can
>>> >> open
>>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks
>>> then
>>> >> > everything works fine. But when I move to 4 tasks, Hama freezes.
>>> Here is
>>> >> the
>>> >> > result of JPS command on slave.
>>> >> >
>>> >> >
>>> >> > Result of JPS command on Master
>>> >> >
>>> >> >
>>> >> >
>>> >> > You can see that it is only opening tasks on slaves but not on
>>> master.
>>> >> >
>>> >> > Note: I tried to change the bsp.tasks.maximum property in
>>> >> hama-default.xml
>>> >> > to 4 but still same result.
>>> >> >
>>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
>>> possible.
>>> >> Is
>>> >> > there any setting that can I do to achieve that ? Or hama picks up
>>> the
>>> >> > values from hama-default.xml to open tasks ?
>>> >> >
>>> >> >
>>> >> > Regards,
>>> >> >
>>> >> > Behroz Sikander
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
In hama_[user]_bspmaster_.....log file I get the following exception. But
this occurs in both cases when I run my job with 3 tasks or with 4 tasks

org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp

        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)

        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)

        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)

        at
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)

        at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:509)

        at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:492)

        at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:475)

        at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)

        at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)

2015-06-26 23:18:41,140 ERROR
org.apache.hama.bsp.sync.ZKSyncBSPMasterClient:
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /bsp

On Sat, Jun 27, 2015 at 1:03 AM, Behroz Sikander <be...@gmail.com> wrote:

> Hi,
> In the current thread, I mentioned 3 issues. Issue 1 and 3 are resolved
> but issue number 2 is still giving me headaches.
>
> My problem:
> My cluster now consists of 3 machines. Each one of them properly
> configured (Apparently). From my master machine when I start Hadoop and
> Hama, I can see the processes started on other 2 machines. If I check the
> maximum tasks that my cluster can support then I get 9 (3 tasks on each
> machine).
>
> When I run the PI example, it uses 9 tasks and runs fine. When I run my
> program with 3 tasks, everything runs fine. But when I increase the tasks
> (to 4) by using "setNumBspTask". Hama freezes. I do not understand what can
> go wrong.
>
> I checked the logs files and things look fine. I just sometimes get an
> exception that hama was not able to delete the sytem directory
> (bsp.system.dir) defined in the hama-site.xml.
>
> Any help or clue would be great.
>
> Regards,
> Behroz Sikander
>
> On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
>> Thank you :)
>>
>> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <ed...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> You can get the maximum number of available tasks like following code:
>>>
>>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>>
>>>     // Set to maximum
>>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>>
>>>
>>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
>>> wrote:
>>> > Hi,
>>> > 1) Thank you for this.
>>> > 2) Here are the images. I will look into the log files of PI example
>>> >
>>> > *Result of JPS command on slave*
>>> >
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >
>>> > *Result of JPS command on Master*
>>> >
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >
>>> > 3) In my current case, I do not have any input submitted to the job.
>>> During
>>> > run time, I directly fetch data from HDFS. So, I am looking for
>>> something
>>> > like BSPJob.set*Max*NumBspTask().
>>> >
>>> > Regards,
>>> > Behroz
>>> >
>>> >
>>> >
>>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <
>>> edwardyoon@apache.org>
>>> > wrote:
>>> >
>>> >> Hello,
>>> >>
>>> >> 1) You can get the filesystem URI from a configuration using
>>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
>>> >> property should be in hama-site.xml
>>> >>
>>> >>   <property>
>>> >>     <name>fs.defaultFS</name>
>>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >>     <description>
>>> >>       The name of the default file system. Either the literal string
>>> >>       "local" or a host:port for HDFS.
>>> >>     </description>
>>> >>   </property>
>>> >>
>>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
>>> >> cluster configuration issue. Please run Pi example and look at the
>>> >> logs for more details. NOTE: you can not attach the images to mailing
>>> >> list so I can't see it.
>>> >>
>>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>>> >> provided, the number of BSP tasks is basically driven by the number of
>>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>> >>
>>> >> Thanks!
>>> >>
>>> >>
>>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> > Recently, I moved from a single machine setup to a 2 machine setup.
>>> I was
>>> >> > successfully able to run my job that uses the HDFS to get data. I
>>> have 3
>>> >> > trivial questions
>>> >> >
>>> >> > 1- To access HDFS, I have to manually give the IP address of server
>>> >> running
>>> >> > HDFS. I thought that Hama will automatically pick from the
>>> configurations
>>> >> > but it does not. I am probably doing something wrong. Right now my
>>> code
>>> >> work
>>> >> > by using the following.
>>> >> >
>>> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
>>> conf);
>>> >> >
>>> >> > 2- On my master server, when I start hama it automatically starts
>>> hama in
>>> >> > the slave machine (all good). Both master and slave are set as
>>> >> groomservers.
>>> >> > This means that I have 2 servers to run my job which means that I
>>> can
>>> >> open
>>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp
>>> tasks then
>>> >> > everything works fine. But when I move to 4 tasks, Hama freezes.
>>> Here is
>>> >> the
>>> >> > result of JPS command on slave.
>>> >> >
>>> >> >
>>> >> > Result of JPS command on Master
>>> >> >
>>> >> >
>>> >> >
>>> >> > You can see that it is only opening tasks on slaves but not on
>>> master.
>>> >> >
>>> >> > Note: I tried to change the bsp.tasks.maximum property in
>>> >> hama-default.xml
>>> >> > to 4 but still same result.
>>> >> >
>>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
>>> possible.
>>> >> Is
>>> >> > there any setting that can I do to achieve that ? Or hama picks up
>>> the
>>> >> > values from hama-default.xml to open tasks ?
>>> >> >
>>> >> >
>>> >> > Regards,
>>> >> >
>>> >> > Behroz Sikander
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Hi,
In the current thread, I mentioned 3 issues. Issue 1 and 3 are resolved but
issue number 2 is still giving me headaches.

My problem:
My cluster now consists of 3 machines. Each one of them properly configured
(Apparently). From my master machine when I start Hadoop and Hama, I can
see the processes started on other 2 machines. If I check the maximum tasks
that my cluster can support then I get 9 (3 tasks on each machine).

When I run the PI example, it uses 9 tasks and runs fine. When I run my
program with 3 tasks, everything runs fine. But when I increase the tasks
(to 4) by using "setNumBspTask". Hama freezes. I do not understand what can
go wrong.

I checked the logs files and things look fine. I just sometimes get an
exception that hama was not able to delete the sytem directory
(bsp.system.dir) defined in the hama-site.xml.

Any help or clue would be great.

Regards,
Behroz Sikander

On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <be...@gmail.com> wrote:

> Thank you :)
>
> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> Hi,
>>
>> You can get the maximum number of available tasks like following code:
>>
>>     BSPJobClient jobClient = new BSPJobClient(conf);
>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>>
>>     // Set to maximum
>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>
>>
>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Hi,
>> > 1) Thank you for this.
>> > 2) Here are the images. I will look into the log files of PI example
>> >
>> > *Result of JPS command on slave*
>> >
>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>> >
>> > *Result of JPS command on Master*
>> >
>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>> >
>> > 3) In my current case, I do not have any input submitted to the job.
>> During
>> > run time, I directly fetch data from HDFS. So, I am looking for
>> something
>> > like BSPJob.set*Max*NumBspTask().
>> >
>> > Regards,
>> > Behroz
>> >
>> >
>> >
>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <edwardyoon@apache.org
>> >
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> 1) You can get the filesystem URI from a configuration using
>> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
>> >> property should be in hama-site.xml
>> >>
>> >>   <property>
>> >>     <name>fs.defaultFS</name>
>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>> >>     <description>
>> >>       The name of the default file system. Either the literal string
>> >>       "local" or a host:port for HDFS.
>> >>     </description>
>> >>   </property>
>> >>
>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
>> >> cluster configuration issue. Please run Pi example and look at the
>> >> logs for more details. NOTE: you can not attach the images to mailing
>> >> list so I can't see it.
>> >>
>> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>> >> provided, the number of BSP tasks is basically driven by the number of
>> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> > Recently, I moved from a single machine setup to a 2 machine setup.
>> I was
>> >> > successfully able to run my job that uses the HDFS to get data. I
>> have 3
>> >> > trivial questions
>> >> >
>> >> > 1- To access HDFS, I have to manually give the IP address of server
>> >> running
>> >> > HDFS. I thought that Hama will automatically pick from the
>> configurations
>> >> > but it does not. I am probably doing something wrong. Right now my
>> code
>> >> work
>> >> > by using the following.
>> >> >
>> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
>> conf);
>> >> >
>> >> > 2- On my master server, when I start hama it automatically starts
>> hama in
>> >> > the slave machine (all good). Both master and slave are set as
>> >> groomservers.
>> >> > This means that I have 2 servers to run my job which means that I can
>> >> open
>> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks
>> then
>> >> > everything works fine. But when I move to 4 tasks, Hama freezes.
>> Here is
>> >> the
>> >> > result of JPS command on slave.
>> >> >
>> >> >
>> >> > Result of JPS command on Master
>> >> >
>> >> >
>> >> >
>> >> > You can see that it is only opening tasks on slaves but not on
>> master.
>> >> >
>> >> > Note: I tried to change the bsp.tasks.maximum property in
>> >> hama-default.xml
>> >> > to 4 but still same result.
>> >> >
>> >> > 3- I want my cluster to open as many BSPPeerChild processes as
>> possible.
>> >> Is
>> >> > there any setting that can I do to achieve that ? Or hama picks up
>> the
>> >> > values from hama-default.xml to open tasks ?
>> >> >
>> >> >
>> >> > Regards,
>> >> >
>> >> > Behroz Sikander
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Thank you :)

On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Hi,
>
> You can get the maximum number of available tasks like following code:
>
>     BSPJobClient jobClient = new BSPJobClient(conf);
>     ClusterStatus cluster = jobClient.getClusterStatus(true);
>
>     // Set to maximum
>     bsp.setNumBspTask(cluster.getMaxTasks());
>
>
> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com>
> wrote:
> > Hi,
> > 1) Thank you for this.
> > 2) Here are the images. I will look into the log files of PI example
> >
> > *Result of JPS command on slave*
> >
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >
> > *Result of JPS command on Master*
> >
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >
> > 3) In my current case, I do not have any input submitted to the job.
> During
> > run time, I directly fetch data from HDFS. So, I am looking for something
> > like BSPJob.set*Max*NumBspTask().
> >
> > Regards,
> > Behroz
> >
> >
> >
> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <ed...@apache.org>
> > wrote:
> >
> >> Hello,
> >>
> >> 1) You can get the filesystem URI from a configuration using
> >> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
> >> property should be in hama-site.xml
> >>
> >>   <property>
> >>     <name>fs.defaultFS</name>
> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>     <description>
> >>       The name of the default file system. Either the literal string
> >>       "local" or a host:port for HDFS.
> >>     </description>
> >>   </property>
> >>
> >> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
> >> cluster configuration issue. Please run Pi example and look at the
> >> logs for more details. NOTE: you can not attach the images to mailing
> >> list so I can't see it.
> >>
> >> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
> >> provided, the number of BSP tasks is basically driven by the number of
> >> DFS blocks. I'll fix it to be more flexible on HAMA-956.
> >>
> >> Thanks!
> >>
> >>
> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > Recently, I moved from a single machine setup to a 2 machine setup. I
> was
> >> > successfully able to run my job that uses the HDFS to get data. I
> have 3
> >> > trivial questions
> >> >
> >> > 1- To access HDFS, I have to manually give the IP address of server
> >> running
> >> > HDFS. I thought that Hama will automatically pick from the
> configurations
> >> > but it does not. I am probably doing something wrong. Right now my
> code
> >> work
> >> > by using the following.
> >> >
> >> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"),
> conf);
> >> >
> >> > 2- On my master server, when I start hama it automatically starts
> hama in
> >> > the slave machine (all good). Both master and slave are set as
> >> groomservers.
> >> > This means that I have 2 servers to run my job which means that I can
> >> open
> >> > more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks
> then
> >> > everything works fine. But when I move to 4 tasks, Hama freezes. Here
> is
> >> the
> >> > result of JPS command on slave.
> >> >
> >> >
> >> > Result of JPS command on Master
> >> >
> >> >
> >> >
> >> > You can see that it is only opening tasks on slaves but not on master.
> >> >
> >> > Note: I tried to change the bsp.tasks.maximum property in
> >> hama-default.xml
> >> > to 4 but still same result.
> >> >
> >> > 3- I want my cluster to open as many BSPPeerChild processes as
> possible.
> >> Is
> >> > there any setting that can I do to achieve that ? Or hama picks up the
> >> > values from hama-default.xml to open tasks ?
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Behroz Sikander
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,

You can get the maximum number of available tasks like following code:

    BSPJobClient jobClient = new BSPJobClient(conf);
    ClusterStatus cluster = jobClient.getClusterStatus(true);

    // Set to maximum
    bsp.setNumBspTask(cluster.getMaxTasks());


On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <be...@gmail.com> wrote:
> Hi,
> 1) Thank you for this.
> 2) Here are the images. I will look into the log files of PI example
>
> *Result of JPS command on slave*
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>
> *Result of JPS command on Master*
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>
> 3) In my current case, I do not have any input submitted to the job. During
> run time, I directly fetch data from HDFS. So, I am looking for something
> like BSPJob.set*Max*NumBspTask().
>
> Regards,
> Behroz
>
>
>
> On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <ed...@apache.org>
> wrote:
>
>> Hello,
>>
>> 1) You can get the filesystem URI from a configuration using
>> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
>> property should be in hama-site.xml
>>
>>   <property>
>>     <name>fs.defaultFS</name>
>>     <value>hdfs://host1.mydomain.com:9000/</value>
>>     <description>
>>       The name of the default file system. Either the literal string
>>       "local" or a host:port for HDFS.
>>     </description>
>>   </property>
>>
>> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
>> cluster configuration issue. Please run Pi example and look at the
>> logs for more details. NOTE: you can not attach the images to mailing
>> list so I can't see it.
>>
>> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
>> provided, the number of BSP tasks is basically driven by the number of
>> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>>
>> Thanks!
>>
>>
>> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
>> wrote:
>> > Hi,
>> > Recently, I moved from a single machine setup to a 2 machine setup. I was
>> > successfully able to run my job that uses the HDFS to get data. I have 3
>> > trivial questions
>> >
>> > 1- To access HDFS, I have to manually give the IP address of server
>> running
>> > HDFS. I thought that Hama will automatically pick from the configurations
>> > but it does not. I am probably doing something wrong. Right now my code
>> work
>> > by using the following.
>> >
>> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"), conf);
>> >
>> > 2- On my master server, when I start hama it automatically starts hama in
>> > the slave machine (all good). Both master and slave are set as
>> groomservers.
>> > This means that I have 2 servers to run my job which means that I can
>> open
>> > more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks then
>> > everything works fine. But when I move to 4 tasks, Hama freezes. Here is
>> the
>> > result of JPS command on slave.
>> >
>> >
>> > Result of JPS command on Master
>> >
>> >
>> >
>> > You can see that it is only opening tasks on slaves but not on master.
>> >
>> > Note: I tried to change the bsp.tasks.maximum property in
>> hama-default.xml
>> > to 4 but still same result.
>> >
>> > 3- I want my cluster to open as many BSPPeerChild processes as possible.
>> Is
>> > there any setting that can I do to achieve that ? Or hama picks up the
>> > values from hama-default.xml to open tasks ?
>> >
>> >
>> > Regards,
>> >
>> > Behroz Sikander
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>



-- 
Best Regards, Edward J. Yoon

Re: Groomserer BSPPeerChild limit

Posted by Behroz Sikander <be...@gmail.com>.
Hi,
1) Thank you for this.
2) Here are the images. I will look into the log files of PI example

*Result of JPS command on slave*
http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png

*Result of JPS command on Master*
http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png

3) In my current case, I do not have any input submitted to the job. During
run time, I directly fetch data from HDFS. So, I am looking for something
like BSPJob.set*Max*NumBspTask().

Regards,
Behroz



On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon <ed...@apache.org>
wrote:

> Hello,
>
> 1) You can get the filesystem URI from a configuration using
> "FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
> property should be in hama-site.xml
>
>   <property>
>     <name>fs.defaultFS</name>
>     <value>hdfs://host1.mydomain.com:9000/</value>
>     <description>
>       The name of the default file system. Either the literal string
>       "local" or a host:port for HDFS.
>     </description>
>   </property>
>
> 2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
> cluster configuration issue. Please run Pi example and look at the
> logs for more details. NOTE: you can not attach the images to mailing
> list so I can't see it.
>
> 3) You can use the BSPJob.setNumBspTask(int) method. If input is
> provided, the number of BSP tasks is basically driven by the number of
> DFS blocks. I'll fix it to be more flexible on HAMA-956.
>
> Thanks!
>
>
> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com>
> wrote:
> > Hi,
> > Recently, I moved from a single machine setup to a 2 machine setup. I was
> > successfully able to run my job that uses the HDFS to get data. I have 3
> > trivial questions
> >
> > 1- To access HDFS, I have to manually give the IP address of server
> running
> > HDFS. I thought that Hama will automatically pick from the configurations
> > but it does not. I am probably doing something wrong. Right now my code
> work
> > by using the following.
> >
> > FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"), conf);
> >
> > 2- On my master server, when I start hama it automatically starts hama in
> > the slave machine (all good). Both master and slave are set as
> groomservers.
> > This means that I have 2 servers to run my job which means that I can
> open
> > more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks then
> > everything works fine. But when I move to 4 tasks, Hama freezes. Here is
> the
> > result of JPS command on slave.
> >
> >
> > Result of JPS command on Master
> >
> >
> >
> > You can see that it is only opening tasks on slaves but not on master.
> >
> > Note: I tried to change the bsp.tasks.maximum property in
> hama-default.xml
> > to 4 but still same result.
> >
> > 3- I want my cluster to open as many BSPPeerChild processes as possible.
> Is
> > there any setting that can I do to achieve that ? Or hama picks up the
> > values from hama-default.xml to open tasks ?
> >
> >
> > Regards,
> >
> > Behroz Sikander
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Groomserer BSPPeerChild limit

Posted by "Edward J. Yoon" <ed...@apache.org>.
Hello,

1) You can get the filesystem URI from a configuration using
"FileSystem fs = FileSystem.get(conf);". Of course, the fs.defaultFS
property should be in hama-site.xml

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://host1.mydomain.com:9000/</value>
    <description>
      The name of the default file system. Either the literal string
      "local" or a host:port for HDFS.
    </description>
  </property>

2) The 'bsp.tasks.maximum' is the number of tasks per node. It looks
cluster configuration issue. Please run Pi example and look at the
logs for more details. NOTE: you can not attach the images to mailing
list so I can't see it.

3) You can use the BSPJob.setNumBspTask(int) method. If input is
provided, the number of BSP tasks is basically driven by the number of
DFS blocks. I'll fix it to be more flexible on HAMA-956.

Thanks!


On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander <be...@gmail.com> wrote:
> Hi,
> Recently, I moved from a single machine setup to a 2 machine setup. I was
> successfully able to run my job that uses the HDFS to get data. I have 3
> trivial questions
>
> 1- To access HDFS, I have to manually give the IP address of server running
> HDFS. I thought that Hama will automatically pick from the configurations
> but it does not. I am probably doing something wrong. Right now my code work
> by using the following.
>
> FileSystem fs = FileSystem.get(new URI("hdfs://server_ip:port/"), conf);
>
> 2- On my master server, when I start hama it automatically starts hama in
> the slave machine (all good). Both master and slave are set as groomservers.
> This means that I have 2 servers to run my job which means that I can open
> more BSPPeerChild processes. And if I submit my jar with 3 bsp tasks then
> everything works fine. But when I move to 4 tasks, Hama freezes. Here is the
> result of JPS command on slave.
>
>
> Result of JPS command on Master
>
>
>
> You can see that it is only opening tasks on slaves but not on master.
>
> Note: I tried to change the bsp.tasks.maximum property in hama-default.xml
> to 4 but still same result.
>
> 3- I want my cluster to open as many BSPPeerChild processes as possible. Is
> there any setting that can I do to achieve that ? Or hama picks up the
> values from hama-default.xml to open tasks ?
>
>
> Regards,
>
> Behroz Sikander



-- 
Best Regards, Edward J. Yoon