You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Tripti Singh <tr...@yahoo-inc.com> on 2012/11/29 07:09:07 UTC

Issue running Giraph on more mappers

Hi,
I am trying to run this workflow which uses Giraph.
I am able to succesfully run the Giraph job when I use lesser no. of mappers  and less data. But it fails for more mappers.
This is what the logs say for master and worker nodes:

Master Node:

2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681!
2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/_zkServer/gsta31113.tan.ygrid.yahoo.com 3
2012-11-29 00:01:10,241 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceMaster (master thread)...
2012-11-29 00:01:10,257 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 3 on gsta31113.tan.ygrid.yahoo.com:24681
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=gsta31113.tan.ygrid.yahoo.com
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path= {really long class path}
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009/tmp
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name=nova_sln
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009
2012-11-29 00:01:10,280 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681 sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceMaster@16f70a4
2012-11-29 00:01:10,304 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:01:10,305 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
2012-11-29 00:01:10,331 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40000, negotiated timeout = 600000
2012-11-29 00:01:10,333 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: map: No need to do anything when not a worker
2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
2012-11-29 00:01:10,396 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000' and my bid is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000'
2012-11-29 00:01:10,403 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: I am now the master!
2012-11-29 00:01:10,423 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: applicationAttemptChanged signaled
2012-11-29 00:01:10,440 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
2012-11-29 00:01:17,475 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: checkWorkers: Only found 1 responses of 60 needed to start superstep -1.  Sleeping for 5000 msecs and used 0 of 60 attempts.
2012-11-29 00:01:28,742 INFO [org.apache.giraph.graph.MasterThread] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 60
2012-11-29 00:01:28,760 WARN [org.apache.giraph.graph.MasterThread] org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: generateInputSplits: Got 240 input splits for 60 workers
2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Starting to write input split data to zookeeper with 1 threads
2012-11-29 00:01:29,228 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Done writing input split data to zookeeper
2012-11-29 00:01:29,348 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Creating 3600, default would have been 3600 partitions.
2012-11-29 00:01:29,348 WARN [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Reducing the partitionCount to 2995 from 3600
2012-11-29 00:08:09,352 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 400000ms for sessionid 0x13b497783e40000, closing socket connection and attempting reconnect
2012-11-29 00:08:09,454 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
2012-11-29 00:08:10,648 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40000, negotiated timeout = 600000
2012-11-29 00:08:10,649 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:31:51,715 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
2012-11-29 00:31:51,715 WARN [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
2012-11-29 00:31:52,094 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).

2012-11-29 00:31:52,093 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x13b497783e40000, likely server has closed socket, closing socket connection and attempting reconnect

Failed Worker Node:

2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [gsta31113.tan.ygrid.yahoo.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceWorker...
2012-11-29 00:01:21,679 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 4 on gsta31113.tan.ygrid.yahoo.com:24681
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=gsta31090.tan.ygrid.yahoo.com
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path={really long class path}
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120/tmp
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name=nova_sln
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120
2012-11-29 00:01:21,695 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681 sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceWorker@1c8fb4b
2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
2012-11-29 00:01:21,744 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40017, negotiated timeout = 600000
2012-11-29 00:01:21,747 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:01:21,754 WARN [main] org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2012-11-29 00:01:22,027 INFO [main] org.apache.giraph.comm.SecureRPCCommunications: getRPCServer: Added jobToken Ident: 18 6a 6f 62 5f 31 33 35 33 31 34 38 37 39 30 32 34 34 5f 31 31 34 34 31 39, Kind: mapreduce.job, Service: job_1353148790244_114419
2012-11-29 00:01:22,608 INFO [Socket Reader #1 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #2 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #3 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #4 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #4 for port 32504
2012-11-29 00:01:22,610 INFO [Socket Reader #5 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #5 for port 32504
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.security.token.service.use_ip;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.map.child.java.opts;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: yarn.app.mapreduce.am.job.client.port-range;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.reduce.child.java.opts;  Ignoring.
2012-11-29 00:01:22,663 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
2012-11-29 00:01:22,691 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-11-29 00:01:22,691 INFO [IPC Server listener on 32504] org.apache.hadoop.ipc.Server: IPC Server listener on 32504: starting
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications: Started RPC communication server: gsta31090.tan.ygrid.yahoo.com/10.216.123.42:32504 with 61 handlers and 59 flush threads on bind attempt 0
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxVerticesPerTransfer = 10000
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxEdgesPerTransfer = 80000 useNetty = false
2012-11-29 00:01:22,716 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Registering health of this worker...
2012-11-29 00:01:22,733 INFO [main] org.apache.giraph.graph.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1353148790244_114419/_masterJobState)
2012-11-29 00:01:22,738 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
2012-11-29 00:01:22,741 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
2012-11-29 00:01:22,747 INFO [main] org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 and workerInfo= Worker(hostname=gsta31090.tan.ygrid.yahoo.com, MRtaskID=4, port=32504)
2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24917 may have finished in the interim.
2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24921 may have finished in the interim.
2012-11-29 00:19:17,006 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24922 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25739 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25743 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25744 may have finished in the interim.
2012-11-29 00:28:07,200 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25752 may have finished in the interim.
2012-11-29 00:31:52,091 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x13b497783e40017, likely server has closed socket, closing socket connection and attempting reconnect
2012-11-29 00:31:52,193 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
2012-11-29 00:31:53,478 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:31:53,480 WARN [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session 0x13b497783e40017 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:348)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2012-11-29 00:31:53,584 ERROR [main] org.apache.giraph.graph.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 on superstep -1


Please let me know if I am missing on some configurations.


Thanks,

Tripti.

Re: Issue running Giraph on more mappers

Posted by Tripti Singh <tr...@yahoo-inc.com>.
Hi Eli,
Yes, I am running this on Hadoop 0.23. I was using giraph from trunk (last updated 10th October).
Is it incompatible with Yarn? Coz' I see a hadoop 0.23 profile and that's what I built.

Thanks,
Tripti.

From: Eli Reisman <ap...@gmail.com>>
Reply-To: "user@giraph.apache.org<ma...@giraph.apache.org>" <us...@giraph.apache.org>>
Date: Saturday, December 1, 2012 1:34 AM
To: "user@giraph.apache.org<ma...@giraph.apache.org>" <us...@giraph.apache.org>>
Subject: Re: Issue running Giraph on more mappers

You're running on a YARN-based cluster?

On Wed, Nov 28, 2012 at 10:09 PM, Tripti Singh <tr...@yahoo-inc.com>> wrote:
Hi,
I am trying to run this workflow which uses Giraph.
I am able to succesfully run the Giraph job when I use lesser no. of mappers  and less data. But it fails for more mappers.
This is what the logs say for master and worker nodes:

Master Node:

2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>!
2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/_zkServer/gsta31113.tan.ygrid.yahoo.com<http://gsta31113.tan.ygrid.yahoo.com> 3
2012-11-29 00:01:10,241 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceMaster (master thread)...
2012-11-29 00:01:10,257 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 3 on gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name<http://host.name>=gsta31113.tan.ygrid.yahoo.com<http://gsta31113.tan.ygrid.yahoo.com>
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path= {really long class path}
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009/tmp
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name<http://os.name>=Linux
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name<http://user.name>=nova_sln
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009
2012-11-29 00:01:10,280 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681> sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceMaster@16f70a4
2012-11-29 00:01:10,304 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:01:10,305 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, initiating session2012-11-29 00:01:10,331 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, sessionid = 0x13b497783e40000, negotiated timeout = 600000
2012-11-29 00:01:10,333 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: map: No need to do anything when not a worker
2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
2012-11-29 00:01:10,396 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000' and my bid is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000'
2012-11-29 00:01:10,403 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: I am now the master!
2012-11-29 00:01:10,423 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: applicationAttemptChanged signaled
2012-11-29 00:01:10,440 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
2012-11-29 00:01:17,475 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: checkWorkers: Only found 1 responses of 60 needed to start superstep -1.  Sleeping for 5000 msecs and used 0 of 60 attempts.
2012-11-29 00:01:28,742 INFO [org.apache.giraph.graph.MasterThread] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 60
2012-11-29 00:01:28,760 WARN [org.apache.giraph.graph.MasterThread] org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: generateInputSplits: Got 240 input splits for 60 workers
2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Starting to write input split data to zookeeper with 1 threads
2012-11-29 00:01:29,228 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Done writing input split data to zookeeper
2012-11-29 00:01:29,348 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Creating 3600, default would have been 3600 partitions.
2012-11-29 00:01:29,348 WARN [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Reducing the partitionCount to 2995 from 3600
2012-11-29 00:08:09,352 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 400000ms for sessionid 0x13b497783e40000, closing socket connection and attempting reconnect
2012-11-29 00:08:09,454 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, initiating session2012-11-29 00:08:10,648 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, sessionid = 0x13b497783e40000, negotiated timeout = 600000
2012-11-29 00:08:10,649 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:31:51,715 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
2012-11-29 00:31:51,715 WARN [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
2012-11-29 00:31:52,094 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).

2012-11-29 00:31:52,093 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x13b497783e40000, likely server has closed socket, closing socket connection and attempting reconnect

Failed Worker Node:

2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [gsta31113.tan.ygrid.yahoo.com<http://gsta31113.tan.ygrid.yahoo.com>] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceWorker...
2012-11-29 00:01:21,679 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 4 on gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name<http://host.name>=gsta31090.tan.ygrid.yahoo.com<http://gsta31090.tan.ygrid.yahoo.com>
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path={really long class path}
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120/tmp
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name<http://os.name>=Linux
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name<http://user.name>=nova_sln
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120
2012-11-29 00:01:21,695 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681> sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceWorker@1c8fb4b
2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, initiating session2012-11-29 00:01:21,744 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>, sessionid = 0x13b497783e40017, negotiated timeout = 600000
2012-11-29 00:01:21,747 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
2012-11-29 00:01:21,754 WARN [main] org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2012-11-29 00:01:22,027 INFO [main] org.apache.giraph.comm.SecureRPCCommunications: getRPCServer: Added jobToken Ident: 18 6a 6f 62 5f 31 33 35 33 31 34 38 37 39 30 32 34 34 5f 31 31 34 34 31 39, Kind: mapreduce.job, Service: job_1353148790244_114419
2012-11-29 00:01:22,608 INFO [Socket Reader #1 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #2 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #3 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port 32504
2012-11-29 00:01:22,609 INFO [Socket Reader #4 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #4 for port 32504
2012-11-29 00:01:22,610 INFO [Socket Reader #5 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #5 for port 32504
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.security.token.service.use_ip;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.map.child.java.opts;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: yarn.app.mapreduce.am.job.client.port-range;  Ignoring.
2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.reduce.child.java.opts;  Ignoring.
2012-11-29 00:01:22,663 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
2012-11-29 00:01:22,691 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-11-29 00:01:22,691 INFO [IPC Server listener on 32504] org.apache.hadoop.ipc.Server: IPC Server listener on 32504: starting
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications: Started RPC communication server: gsta31090.tan.ygrid.yahoo.com/10.216.123.42:32504<http://gsta31090.tan.ygrid.yahoo.com/10.216.123.42:32504> with 61 handlers and 59 flush threads on bind attempt 0
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxVerticesPerTransfer = 10000
2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxEdgesPerTransfer = 80000 useNetty = false
2012-11-29 00:01:22,716 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Registering health of this worker...
2012-11-29 00:01:22,733 INFO [main] org.apache.giraph.graph.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1353148790244_114419/_masterJobState)
2012-11-29 00:01:22,738 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
2012-11-29 00:01:22,741 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
2012-11-29 00:01:22,747 INFO [main] org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 and workerInfo= Worker(hostname=gsta31090.tan.ygrid.yahoo.com<http://gsta31090.tan.ygrid.yahoo.com>, MRtaskID=4, port=32504)
2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24917 may have finished in the interim.
2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24921 may have finished in the interim.
2012-11-29 00:19:17,006 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24922 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25739 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25743 may have finished in the interim.
2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25744 may have finished in the interim.
2012-11-29 00:28:07,200 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25752 may have finished in the interim.
2012-11-29 00:31:52,091 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x13b497783e40017, likely server has closed socket, closing socket connection and attempting reconnect
2012-11-29 00:31:52,193 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
2012-11-29 00:31:53,478 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681<http://gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681>. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2012-11-29 00:31:53,480 WARN [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681<http://gsta31113.tan.ygrid.yahoo.com:24681>)] org.apache.zookeeper.ClientCnxn: Session 0x13b497783e40017 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:348)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2012-11-29 00:31:53,584 ERROR [main] org.apache.giraph.graph.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 on superstep -1


Please let me know if I am missing on some configurations.


Thanks,

Tripti.


Re: Issue running Giraph on more mappers

Posted by Eli Reisman <ap...@gmail.com>.
You're running on a YARN-based cluster?

On Wed, Nov 28, 2012 at 10:09 PM, Tripti Singh <tr...@yahoo-inc.com> wrote:

> Hi,
> I am trying to run this workflow which uses Giraph.
> I am able to succesfully run the Giraph job when I use lesser no. of
> mappers  and less data. But it fails for more mappers.
> This is what the logs say for master and worker nodes:
>
> Master Node:
>
> 2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681!
> 2012-11-29 00:01:10,235 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/_zkServer/gsta31113.tan.ygrid.yahoo.com 3
> 2012-11-29 00:01:10,241 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceMaster (master thread)...
> 2012-11-29 00:01:10,257 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 3 on gsta31113.tan.ygrid.yahoo.com:24681
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=gsta31113.tan.ygrid.yahoo.com
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path= {really long class path}
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
> 2012-11-29 00:01:10,278 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009/tmp
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=**
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name=nova_sln
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
> 2012-11-29 00:01:10,279 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000009
> 2012-11-29 00:01:10,280 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681 sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceMaster@16f70a4
> 2012-11-29 00:01:10,304 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
> 2012-11-29 00:01:10,305 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
> 2012-11-29 00:01:10,331 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40000, negotiated timeout = 600000
> 2012-11-29 00:01:10,333 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
> 2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: map: No need to do anything when not a worker
> 2012-11-29 00:01:10,335 INFO [main] org.apache.giraph.graph.GraphMapper: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
> 2012-11-29 00:01:10,396 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000' and my bid is '/_hadoopBsp/job_1353148790244_114419/_masterElectionDir/gsta31113.tan.ygrid.yahoo.com_30000000000'
> 2012-11-29 00:01:10,403 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: becomeMaster: I am now the master!
> 2012-11-29 00:01:10,423 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: applicationAttemptChanged signaled
> 2012-11-29 00:01:10,440 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
> 2012-11-29 00:01:17,475 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: checkWorkers: Only found 1 responses of 60 needed to start superstep -1.  Sleeping for 5000 msecs and used 0 of 60 attempts.
> 2012-11-29 00:01:28,742 INFO [org.apache.giraph.graph.MasterThread] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 60
> 2012-11-29 00:01:28,760 WARN [org.apache.giraph.graph.MasterThread] org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
> 2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: generateInputSplits: Got 240 input splits for 60 workers
> 2012-11-29 00:01:28,887 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Starting to write input split data to zookeeper with 1 threads
> 2012-11-29 00:01:29,228 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.BspServiceMaster: createInputSplits: Done writing input split data to zookeeper
> 2012-11-29 00:01:29,348 INFO [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Creating 3600, default would have been 3600 partitions.
> 2012-11-29 00:01:29,348 WARN [org.apache.giraph.graph.MasterThread] org.apache.giraph.graph.partition.HashMasterPartitioner: createInitialPartitionOwners: Reducing the partitionCount to 2995 from 3600
> 2012-11-29 00:08:09,352 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 400000ms for sessionid 0x13b497783e40000, closing socket connection and attempting reconnect
> 2012-11-29 00:08:09,454 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
> 2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
> 2012-11-29 00:08:10,645 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
> 2012-11-29 00:08:10,648 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40000, negotiated timeout = 600000
> 2012-11-29 00:08:10,649 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
> 2012-11-29 00:31:51,715 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
> 2012-11-29 00:31:51,715 WARN [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
> 2012-11-29 00:31:52,094 INFO [Thread-11] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed). **
>
> 2012-11-29 00:31:52,093 INFO [main-SendThread(
> gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn:
> Unable to read additional data from server sessionid 0x13b497783e40000,
> likely server has closed socket, closing socket connection and attempting
> reconnect
>
> Failed Worker Node:
>
> 2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [gsta31113.tan.ygrid.yahoo.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
> 2012-11-29 00:01:21,666 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Starting up BspServiceWorker...
> 2012-11-29 00:01:21,679 INFO [main] org.apache.giraph.graph.BspService: BspService: Connecting to ZooKeeper with job job_1353148790244_114419, 4 on gsta31113.tan.ygrid.yahoo.com:24681
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.4-1386507, built on 09/17/2012 08:33 GMT
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=gsta31090.tan.ygrid.yahoo.com
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path={really long class path}
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386/server:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/lib/i386:/home/Releases/gridjdk-1.6.0_21.1011192346-20110120-000/share/gridjdk-1.6.0_21/jre/../lib/i386:/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120:/home/gs/hadoop/current/lib/native/Linux-i386-32:/usr/java/packages/lib/i386:/lib:/usr/lib
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120/tmp
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=**
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-238.19.1.el5.YAHOO.20111028
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.name=nova_sln
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/homes/nova_sln
> 2012-11-29 00:01:21,694 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/grid/2/tmp/yarn-local/usercache/nova_sln/appcache/application_1353148790244_114419/container_1353148790244_114419_01_000120
> 2012-11-29 00:01:21,695 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=gsta31113.tan.ygrid.yahoo.com:24681 sessionTimeout=60000 watcher=org.apache.giraph.graph.BspServiceWorker@1c8fb4b
> 2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
> 2012-11-29 00:01:21,737 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Socket connection established to gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, initiating session
> 2012-11-29 00:01:21,744 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681, sessionid = 0x13b497783e40017, negotiated timeout = 600000
> 2012-11-29 00:01:21,747 INFO [main-EventThread] org.apache.giraph.graph.BspService: process: Asynchronous connection complete.
> 2012-11-29 00:01:21,754 WARN [main] org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
> 2012-11-29 00:01:22,027 INFO [main] org.apache.giraph.comm.SecureRPCCommunications: getRPCServer: Added jobToken Ident: 18 6a 6f 62 5f 31 33 35 33 31 34 38 37 39 30 32 34 34 5f 31 31 34 34 31 39, Kind: mapreduce.job, Service: job_1353148790244_114419
> 2012-11-29 00:01:22,608 INFO [Socket Reader #1 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 32504
> 2012-11-29 00:01:22,609 INFO [Socket Reader #2 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #2 for port 32504
> 2012-11-29 00:01:22,609 INFO [Socket Reader #3 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #3 for port 32504
> 2012-11-29 00:01:22,609 INFO [Socket Reader #4 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #4 for port 32504
> 2012-11-29 00:01:22,610 INFO [Socket Reader #5 for port 32504] org.apache.hadoop.ipc.Server: Starting Socket Reader #5 for port 32504
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.security.token.service.use_ip;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.map.child.java.opts;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: yarn.app.mapreduce.am.job.client.port-range;  Ignoring.
> 2012-11-29 00:01:22,662 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.admin.reduce.child.java.opts;  Ignoring.
> 2012-11-29 00:01:22,663 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.tmp.dir;  Ignoring.
> 2012-11-29 00:01:22,691 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> 2012-11-29 00:01:22,691 INFO [IPC Server listener on 32504] org.apache.hadoop.ipc.Server: IPC Server listener on 32504: starting
> 2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications: Started RPC communication server: gsta31090.tan.ygrid.yahoo.com/10.216.123.42:32504 with 61 handlers and 59 flush threads on bind attempt 0
> 2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxVerticesPerTransfer = 10000
> 2012-11-29 00:01:22,707 INFO [main] org.apache.giraph.graph.BspServiceWorker: BspServiceWorker: maxEdgesPerTransfer = 80000 useNetty = false
> 2012-11-29 00:01:22,716 INFO [main] org.apache.giraph.graph.GraphMapper: setup: Registering health of this worker...
> 2012-11-29 00:01:22,733 INFO [main] org.apache.giraph.graph.BspService: getJobState: Job state already exists (/_hadoopBsp/job_1353148790244_114419/_masterJobState)
> 2012-11-29 00:01:22,738 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
> 2012-11-29 00:01:22,741 INFO [main] org.apache.giraph.graph.BspService: getApplicationAttempt: Node /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir already exists!
> 2012-11-29 00:01:22,747 INFO [main] org.apache.giraph.graph.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 and workerInfo= Worker(hostname=gsta31090.tan.ygrid.yahoo.com, MRtaskID=4, port=32504)
> 2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24917 may have finished in the interim.
> 2012-11-29 00:19:17,005 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24921 may have finished in the interim.
> 2012-11-29 00:19:17,006 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 24922 may have finished in the interim.
> 2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25739 may have finished in the interim.
> 2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25743 may have finished in the interim.
> 2012-11-29 00:27:37,081 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25744 may have finished in the interim.
> 2012-11-29 00:28:07,200 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 25752 may have finished in the interim.
> 2012-11-29 00:31:52,091 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x13b497783e40017, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-11-29 00:31:52,193 WARN [main-EventThread] org.apache.giraph.graph.BspService: process: Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None path:null
> 2012-11-29 00:31:53,478 INFO [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server gsta31113.tan.ygrid.yahoo.com/10.216.124.59:24681. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
> 2012-11-29 00:31:53,480 WARN [main-SendThread(gsta31113.tan.ygrid.yahoo.com:24681)] org.apache.zookeeper.ClientCnxn: Session 0x13b497783e40017 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:348)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
> 2012-11-29 00:31:53,584 ERROR [main] org.apache.giraph.graph.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_1353148790244_114419/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta31090.tan.ygrid.yahoo.com_4 on superstep –1**
>
> **
> **
>
> **Please let me know if I am missing on some configurations.**
>
> **
> **
>
> **Thanks,**
>
> **Tripti.**
>
>