You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Bryan Rowe <br...@gmail.com> on 2014/07/03 22:53:19 UTC

Errors with large graphs (continued, new email)

First of all, I started this email thread with my old email from Yahoo,
which was a mistake because it kept sending out duplicates.  Sorry for the
inconvenience, but I'll continue it using this email thread from now on.

*I originally posted this:*
Hello,

Giraph: release-1.0.0-RC3

In short, when I use large graphs with the Shortest Paths example, it
fails.  But when I use the small graph provided on the Quick Start guide,
it succeeds.
I converted all of my large graphs into the format shown in the Quick Start
guide to simply things.
I'm using a one-node setup.


Here is the command I'm using:
hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar

org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsVertex
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/ubuntu/input/CA.txt -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/ubuntu/output/shortestpaths
-w 1

(all on one line)

CA.txt is a large graph file: 96,026,228 bytes

The job fails in 10mins, 46sec.

Two Map tasks are created when run.
The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
===========================================================================

2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2014-07-02 17:01:34,945 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Log level remains at info
2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager:
Distributed cache is empty. Assuming fatjar.
2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager:
setup: classpath @
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager:
createCandidateStamp: Made the directory
_bsp/_defaultZkManagerDir/job_201407021636_0006
2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager:
createCandidateStamp: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: Got
[ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1
candidates when 1 required (polling period is 3000) on attempt 0
2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager:
createZooKeeperServerList: Creating the final ZooKeeper file
'_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: For task 0, got file
'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
(polling period is 3000)
2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: Found
[ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in
filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Trying to delete old directory
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Creating file
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg
in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
with base port 22181
2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Delete of zoo.cfg = false
2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Attempting to start ZooKeeper server with
command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m,
-XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
-XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar,
org.apache.zookeeper.server.quorum.QuorumPeerMain,
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg]
in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Shutdown hook added.
2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect
to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll
msecs = 3000
2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect
to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll
msecs = 3000
2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connected to
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0
2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Chosen to run ZooKeeper...
2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Starting up BspServiceMaster (master thread)...
2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService:
BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0
on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.7.0_60
2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Oracle Corporation
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
 environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=3.2.0-58-virtual
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=ubuntu
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/ubuntu
2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper:
Initiating client connection,
connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
sessionTimeout=60000
watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
2014-07-02 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
initiating session
2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
sessionid = 0x146f806378d0000, negotiated timeout = 600000
2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService:
process: Asynchronous connection complete.
2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager:
map: No need to do anything when not a worker
2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager:
cleanup: Starting for MASTER_ZOOKEEPER_ONLY
2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService:
getJobState: Job state already exists
(/_hadoopBsp/job_201407021636_0006/_masterJobState)
2014-07-02 17:01:45,038 INFO
org.apache.giraph.master.BspServiceMaster: becomeMaster: First child
is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already
exists!
2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService:
process: applicationAttemptChanged signaled
2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer:
NettyServer: Using execution handler with 8 threads after
requestFrameDecoder.
2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server:
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768
receiveBufferSize = 524288 backlog = 1
2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after
requestEncoder.
2014-07-02 17:01:45,119 INFO
org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the
master!
2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already
exists!
2014-07-02 17:01:45,140 INFO
org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths
to process : 1
2014-07-02 17:01:45,149 INFO
org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits:
Got 2 input splits for 1 input threads
2014-07-02 17:01:45,149 INFO
org.apache.giraph.master.BspServiceMaster: createVertexInputSplits:
Starting to write input split data to zookeeper with 1 threads
2014-07-02 17:01:45,163 INFO
org.apache.giraph.master.BspServiceMaster: createVertexInputSplits:
Done writing input split data to zookeeper
2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 1 connections, (1 total
connected) 0 failed, 0 failures total.
2014-07-02 17:01:45,188 INFO
org.apache.giraph.partition.PartitionUtils: computePartitionCount:
Creating 1, default would have been 1 partitions.
2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-07-02 17:06:45,347 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
of 1 workers finished on superstep -1 on path
/_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
2014-07-02 17:06:45,349 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
2014-07-02 17:11:45,355 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
of 1 workers finished on superstep -1 on path
/_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
2014-07-02 17:11:45,355 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager:
run: Shutdown hook started.
2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.
2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable
to read additional data from server sessionid 0x146f806378d0000,
likely server has closed socket, closing socket connection and
attempting reconnect
2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: ZooKeeper process exited with 143 (note that
143 typically means killed).
===========================================================================








The second one, task_201407021636_0006_m_000001, goes to the FAILED state.
syslog:
===========================================================================

2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2014-07-02 17:01:38,203 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Log level remains at info
2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager:
Distributed cache is empty. Assuming fatjar.
2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager:
setup: classpath @
/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar
for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager:
createCandidateStamp: Made the directory
_bsp/_defaultZkManagerDir/job_201407021636_0006
2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager:
createCandidateStamp: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com
1
2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: For task 1, got file
'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
(polling period is 3000)
2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: Found
[ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in
filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com
0 '
2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got
[ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1
ready servers when 1 required (polling period is 3000) on attempt 0
2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Starting up BspServiceWorker...
2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService:
BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1
on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.7.0_60
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Oracle Corporation
2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
 environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=3.2.0-58-virtual
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=ubuntu
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/ubuntu
2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper:
Initiating client connection,
connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
initiating session
2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181,
sessionid = 0x146f806378d0001, negotiated timeout = 600000
2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService:
process: Asynchronous connection complete.
2014-07-02 17:01:42,819 INFO
org.apache.giraph.comm.netty.NettyWorkerServer:
createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since
there is no combiner
2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer:
NettyServer: Using execution handler with 8 threads after
requestFrameDecoder.
2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer:
start: Started server communication server:
ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001
with up to 16 threads on bind attempt 0 with sendBufferSize = 32768
receiveBufferSize = 524288 backlog = 1
2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient:
NettyClient: Using execution handler with 8 threads after
requestEncoder.
2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager:
setup: Registering health of this worker...
2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService:
process: applicationAttemptChanged signaled
2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService:
process: Unknown and unprocessed event
(path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
2014-07-02 17:01:45,076 INFO
org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my
health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
and workerInfo=
Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com,
MRtaskID=1, port=30001)
2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer:
start: Using Netty without authentication.
2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService:
process: partitionAssignmentsReadyChanged (partitions are assigned)
2014-07-02 17:01:45,345 INFO
org.apache.giraph.worker.BspServiceWorker: startSuperstep:
Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com,
MRtaskID=0, port=30000)
2014-07-02 17:01:45,345 INFO
org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for
computation on superstep -1 since worker selection and vertex range
assignments are done in
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient:
Using Netty without authentication.
2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient:
connectAllAddresses: Successfully added 1 connections, (1 total
connected) 0 failed, 0 failures total.
2014-07-02 17:01:45,359 INFO
org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1
thread(s), originally 1 threads(s) for 2 total splits.
2014-07-02 17:01:45,362 INFO
org.apache.giraph.comm.SendPartitionCache: SendPartitionCache:
maxVerticesPerTransfer = 10000
2014-07-02 17:01:45,362 INFO
org.apache.giraph.comm.SendPartitionCache: SendPartitionCache:
maxEdgesPerTransfer = 80000
2014-07-02 17:01:45,372 INFO
org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit:
Reserved input split path
/_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall
roughly 0.0% input splits reserved
2014-07-02 17:01:45,373 INFO
org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved
/_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from
ZooKeeper and got input split
'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
2014-07-02 17:01:46,615 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627
vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory
(free/total/max) = 112.94M / 182.50M / 182.50M
2014-07-02 17:01:47,440 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394
vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory
(free/total/max) = 45.07M / 187.50M / 187.50M
2014-07-02 17:01:51,322 INFO
org.apache.giraph.worker.VertexInputSplitsCallable:
readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283
vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory
(free/total/max) = 16.73M / 189.50M / 189.50M
2014-07-02 17:01:55,205 ERROR
org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got
failure, unregistering health on
/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1
on superstep -1
2014-07-02 17:01:55,213 INFO
org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs'
truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of
14400 seconds.
2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO:
Got UserName ubuntu for UID 1000 from the native implementation
2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalStateException: run: Caught an unrecoverable
exception waitFor: ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: waitFor:
ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
	... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.OutOfMemoryError: Java heap space
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
	... 15 more
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task

===========================================================================




I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by
adding this:
  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

But that just caused the entire job to fail from the start.

Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and those
versions provide me with more and different errors to debug that relate to
problems with Hadoop itself.  So the Giraph version I'm currently using
seems to be the best for me because these errors seem more manageable.

What can I do to fix this error?  I thought Giraph was built for large
scale graph processing so I suppose this problem was encountered before by
someone testing large graphs.  I searched through the mailing archives and
couldn't find anything though.

I can provide more information if you need it.  Thanks a lot.

Bryan Rowe



*Then Young Han replied:*

10 minutes seems way too long to load in 91mb from HDFS. Are you sure your
graph's format is correct? For the Json input formats, each line of your
file should be:

[vertex id, vertex value, [[dst id, edge weight], [dst id, edge weigth],
..., [dst id, edge weight]]]

You can set the vertex value for every vertex to 0 (SSSP will overwrite
that value in the first superstep). If your graph doesn't have edge
weights, you can just set them to 1.

Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.

Young


*Then I replied:*
Hi Young,

I believe my graph has the correct format.
Here is the first 40 lines of the graph I'm using:
[0,0,[[1,1],[4,1],[5,1]]]
[1,0,[[0,1],[2,1],[3,1]]]
[2,0,[[1,1],[6,1],[7,1]]]
[3,0,[[8,1],[1,1],[9,1]]]
[4,0,[[0,1],[10,1],[11,1]]]
[5,0,[[0,1]]]
[6,0,[[2,1],[12,1]]]
[7,0,[[2,1],[12,1],[13,1]]]
[8,0,[[11,1],[3,1],[60,1]]]
[9,0,[[3,1]]]
[10,0,[[35,1],[4,1],[38,1]]]
[11,0,[[8,1],[59,1],[4,1]]]
[12,0,[[41,1],[6,1],[7,1]]]
[13,0,[[89,1],[90,1],[91,1],[7,1]]]
[14,0,[[18,1],[19,1],[15,1]]]
[15,0,[[16,1],[17,1],[14,1]]]
[16,0,[[20,1],[21,1],[22,1],[15,1]]]
[17,0,[[24,1],[23,1],[21,1],[15,1]]]
[18,0,[[24,1],[14,1]]]
[19,0,[[25,1],[22,1],[14,1]]]
[20,0,[[16,1],[25,1],[26,1]]]
[21,0,[[16,1],[17,1],[30,1]]]
[22,0,[[16,1],[19,1]]]
[23,0,[[17,1],[105,1]]]
[24,0,[[17,1],[18,1],[58,1]]]
[25,0,[[27,1],[19,1],[20,1]]]
[26,0,[[28,1],[27,1],[20,1]]]
[27,0,[[25,1],[26,1],[29,1]]]
[28,0,[[26,1],[29,1],[30,1]]]
[29,0,[[27,1],[28,1],[31,1]]]
[30,0,[[32,1],[33,1],[28,1],[21,1]]]
[31,0,[[34,1],[29,1]]]
[32,0,[[105,1],[30,1],[39,1]]]
[33,0,[[30,1]]]
[34,0,[[38,1],[31,1]]]
[35,0,[[10,1],[36,1],[37,1]]]
[36,0,[[40,1],[35,1],[39,1]]]
[37,0,[[41,1],[35,1]]]
[38,0,[[10,1],[34,1]]]
[39,0,[[32,1],[58,1],[36,1],[119,1]]]
[40,0,[[90,1],[36,1]]]

Also, sorry about sending the email twice.  My email client messed up.

Thanks,
Bryan

I've tried a larger graph.  It ran for 2 hours then failed with different
error messages I believe.

Bryan

Re: Errors with large graphs (continued, new email)

Posted by Young Han <yo...@uwaterloo.ca>.
Great.

I think it takes so long because you have a graph with a very large
diameter (hence the very high # of supersteps)---it's one of the cases
where BSP doesn't do so well. If you try graphs with smaller diameters
(e.g., social network/webgraphs from SNAP), you'll find that they finish
much faster.

It's reasonable. Input loading didn't take more than 20 seconds, and each
superstep is at most ~1.5s. I don't believe adding more machines will help
in this case, as it took a long time due to the sheer number of supersteps
rather than the cost of a single superstep. (I.e., adding more machines
will not reduce the # of supersteps---you'll probably find most of your
machines idling.)

Young


On Thu, Jul 3, 2014 at 6:46 PM, Bryan Rowe <br...@gmail.com> wrote:

> Awesome, the job completed after adding those settings:
> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032211_0002&refresh=0
>
> Thanks a lot for the help.  It did take 13mins, 47sec to complete though,
> which seems kind of long.  However it is running on just one node.  It will
> probably be a lot faster on a multi-node setup, which I plan on doing.
>
> One more thing, do you think that the time it took to complete is
> reasonable for a one-node setup?
>
> Thanks,
> Bryan
>
>
> On Thu, Jul 3, 2014 at 2:59 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> Ahh yes, that's another common error. Try adding this to your mapred-site:
>>
>> <property>
>>   <name>mapreduce.job.counters.max</name>
>>   <value>1000000</value>
>> </property>
>> <property>
>>   <name>mapreduce.job.counters.limit</name>
>>   <value>1000000</value>
>> </property>
>>
>> Young
>>
>>
>> On Thu, Jul 3, 2014 at 5:49 PM, Bryan Rowe <br...@gmail.com> wrote:
>>
>>> I just changed my mapred-site.xml file to increase the heap size:
>>> <configuration>
>>> <property>
>>> <name>mapred.job.tracker</name>
>>> <value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311
>>> </value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>> <value>4</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.map.tasks</name>
>>> <value>4</value>
>>> </property>
>>>
>>>
>>>
>>>
>>>
>>> *<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
>>> </configuration>
>>>
>>>
>>> I ran a Hadoop job and it seemed to be working for the first 2 minutes
>>> because it was computing supersteps.  However after 2 minutes it failed.
>>> Part of the reason is because the Haddop counters exceeded 120.  There were
>>> other errors though and I'm not sure if those are related to the counters
>>> problem.
>>>
>>> Here is an image album of the errors:  http://imgur.com/a/mAqn3
>>> Also, here is a link to the Hadoop job that failed:
>>> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30
>>>
>>> I'll keep my instance up and running so the link should work.
>>> What would you recommend next?
>>>
>>> Thanks,
>>> Bryan
>>>
>>>
>>> On Thu, Jul 3, 2014 at 2:19 PM, Young Han <yo...@uwaterloo.ca>
>>> wrote:
>>>
>>>> What sort of error does it give? Did you try restarting Hadoop after
>>>> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>>>>
>>>> Young
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <br...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, before conversion they were also sorted by vertex ID.
>>>>> 0       1       1
>>>>> 0       4       1
>>>>> 0       5       1
>>>>> 1       0       1
>>>>> 1       2       1
>>>>> 1       3       1
>>>>> 2       1       1
>>>>> 2       6       1
>>>>> 2       7       1
>>>>> 3       8       1
>>>>>
>>>>> That was the format before conversion with format [source destination
>>>>> weight] of edges.
>>>>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and
>>>>> 15 Gi of memory according to Amazon's website.
>>>>>
>>>>> Any suggestions on tweaking Xmx?  I tried doing that without any luck,
>>>>> it just didn't run the map phase then failed.
>>>>>
>>>>>
>>>>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca>
>>>>> wrote:
>>>>>
>>>>>> From the other thread...
>>>>>>
>>>>>> Yeah, your input format looks correct. Did you have the graph sorted
>>>>>> by source vertex IDs before conversion? (I'm not sure if duplicate entries
>>>>>> with the same source ID matters, but just in case.)
>>>>>>
>>>>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>>>>> type of EC2 instances are you using? You probably want to use something
>>>>>> larger than t1.micro or m1.small.
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> First of all, I started this email thread with my old email from
>>>>>>> Yahoo, which was a mistake because it kept sending out duplicates.  Sorry
>>>>>>> for the inconvenience, but I'll continue it using this email thread from
>>>>>>> now on.
>>>>>>>
>>>>>>> *I originally posted this:*
>>>>>>> Hello,
>>>>>>>
>>>>>>> Giraph: release-1.0.0-RC3
>>>>>>>
>>>>>>> In short, when I use large graphs with the Shortest Paths example,
>>>>>>> it fails.  But when I use the small graph provided on the Quick Start
>>>>>>> guide, it succeeds.
>>>>>>> I converted all of my large graphs into the format shown in the
>>>>>>> Quick Start guide to simply things.
>>>>>>> I'm using a one-node setup.
>>>>>>>
>>>>>>>
>>>>>>> Here is the command I'm using:
>>>>>>> hadoop jar
>>>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>>>>
>>>>>>> org.apache.giraph.GiraphRunner
>>>>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> -vif
>>>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>>>>  -op /user/ubuntu/output/shortestpaths
>>>>>>> -w 1
>>>>>>>
>>>>>>> (all on one line)
>>>>>>>
>>>>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>>>>
>>>>>>> The job fails in 10mins, 46sec.
>>>>>>>
>>>>>>> Two Map tasks are created when run.
>>>>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>>>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>>>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>>>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>>>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>>>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>>>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>>>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>>>>> state.  syslog:
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>>>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>>>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>>>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>>>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>>>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>>>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>>>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>>>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>>>> 	... 7 more
>>>>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>>>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>>>>> 	... 15 more
>>>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>>>>>
>>>>>>>
>>>>>>> ===========================================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml
>>>>>>> by adding this:
>>>>>>>   <property>
>>>>>>>     <name>mapred.child.java.opts</name>
>>>>>>>     <value>-Xmx1024m</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>> But that just caused the entire job to fail from the start.
>>>>>>>
>>>>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>>>>> those versions provide me with more and different errors to debug that
>>>>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>>>>> using seems to be the best for me because these errors seem more manageable.
>>>>>>>
>>>>>>> What can I do to fix this error?  I thought Giraph was built for
>>>>>>> large scale graph processing so I suppose this problem was encountered
>>>>>>> before by someone testing large graphs.  I searched through the mailing
>>>>>>> archives and couldn't find anything though.
>>>>>>>
>>>>>>> I can provide more information if you need it.  Thanks a lot.
>>>>>>>
>>>>>>> Bryan Rowe
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Then Young Han replied:*
>>>>>>>
>>>>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you
>>>>>>> sure your graph's format is correct? For the Json input formats, each line
>>>>>>> of your file should be:
>>>>>>>
>>>>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>>>>> weigth], ..., [dst id, edge weight]]]
>>>>>>>
>>>>>>> You can set the vertex value for every vertex to 0 (SSSP will
>>>>>>> overwrite that value in the first superstep). If your graph doesn't have
>>>>>>> edge weights, you can just set them to 1.
>>>>>>>
>>>>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>>>>
>>>>>>> Young
>>>>>>>
>>>>>>>
>>>>>>> *Then I replied:*
>>>>>>> Hi Young,
>>>>>>>
>>>>>>> I believe my graph has the correct format.
>>>>>>> Here is the first 40 lines of the graph I'm using:
>>>>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>>>>> [5,0,[[0,1]]]
>>>>>>> [6,0,[[2,1],[12,1]]]
>>>>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>>>>> [9,0,[[3,1]]]
>>>>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>>>>> [18,0,[[24,1],[14,1]]]
>>>>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>>>>> [22,0,[[16,1],[19,1]]]
>>>>>>> [23,0,[[17,1],[105,1]]]
>>>>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>>>>> [31,0,[[34,1],[29,1]]]
>>>>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>>>>> [33,0,[[30,1]]]
>>>>>>> [34,0,[[38,1],[31,1]]]
>>>>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>>>>> [37,0,[[41,1],[35,1]]]
>>>>>>> [38,0,[[10,1],[34,1]]]
>>>>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>>>>> [40,0,[[90,1],[36,1]]]
>>>>>>>
>>>>>>> Also, sorry about sending the email twice.  My email client messed
>>>>>>> up.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bryan
>>>>>>>
>>>>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>>>>> different error messages I believe.
>>>>>>>
>>>>>>> Bryan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Errors with large graphs (continued, new email)

Posted by Bryan Rowe <br...@gmail.com>.
Awesome, the job completed after adding those settings:
http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032211_0002&refresh=0

Thanks a lot for the help.  It did take 13mins, 47sec to complete though,
which seems kind of long.  However it is running on just one node.  It will
probably be a lot faster on a multi-node setup, which I plan on doing.

One more thing, do you think that the time it took to complete is
reasonable for a one-node setup?

Thanks,
Bryan


On Thu, Jul 3, 2014 at 2:59 PM, Young Han <yo...@uwaterloo.ca> wrote:

> Ahh yes, that's another common error. Try adding this to your mapred-site:
>
> <property>
>   <name>mapreduce.job.counters.max</name>
>   <value>1000000</value>
> </property>
> <property>
>   <name>mapreduce.job.counters.limit</name>
>   <value>1000000</value>
> </property>
>
> Young
>
>
> On Thu, Jul 3, 2014 at 5:49 PM, Bryan Rowe <br...@gmail.com> wrote:
>
>> I just changed my mapred-site.xml file to increase the heap size:
>> <configuration>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311
>> </value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>4</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>4</value>
>> </property>
>>
>>
>>
>>
>>
>> *<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
>> </configuration>
>>
>>
>> I ran a Hadoop job and it seemed to be working for the first 2 minutes
>> because it was computing supersteps.  However after 2 minutes it failed.
>> Part of the reason is because the Haddop counters exceeded 120.  There were
>> other errors though and I'm not sure if those are related to the counters
>> problem.
>>
>> Here is an image album of the errors:  http://imgur.com/a/mAqn3
>> Also, here is a link to the Hadoop job that failed:
>> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30
>>
>> I'll keep my instance up and running so the link should work.
>> What would you recommend next?
>>
>> Thanks,
>> Bryan
>>
>>
>> On Thu, Jul 3, 2014 at 2:19 PM, Young Han <yo...@uwaterloo.ca> wrote:
>>
>>> What sort of error does it give? Did you try restarting Hadoop after
>>> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>>>
>>> Young
>>>
>>>
>>> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <br...@gmail.com>
>>> wrote:
>>>
>>>> Yes, before conversion they were also sorted by vertex ID.
>>>> 0       1       1
>>>> 0       4       1
>>>> 0       5       1
>>>> 1       0       1
>>>> 1       2       1
>>>> 1       3       1
>>>> 2       1       1
>>>> 2       6       1
>>>> 2       7       1
>>>> 3       8       1
>>>>
>>>> That was the format before conversion with format [source destination
>>>> weight] of edges.
>>>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15
>>>> Gi of memory according to Amazon's website.
>>>>
>>>> Any suggestions on tweaking Xmx?  I tried doing that without any luck,
>>>> it just didn't run the map phase then failed.
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca>
>>>> wrote:
>>>>
>>>>> From the other thread...
>>>>>
>>>>> Yeah, your input format looks correct. Did you have the graph sorted
>>>>> by source vertex IDs before conversion? (I'm not sure if duplicate entries
>>>>> with the same source ID matters, but just in case.)
>>>>>
>>>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>>>> type of EC2 instances are you using? You probably want to use something
>>>>> larger than t1.micro or m1.small.
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> First of all, I started this email thread with my old email from
>>>>>> Yahoo, which was a mistake because it kept sending out duplicates.  Sorry
>>>>>> for the inconvenience, but I'll continue it using this email thread from
>>>>>> now on.
>>>>>>
>>>>>> *I originally posted this:*
>>>>>> Hello,
>>>>>>
>>>>>> Giraph: release-1.0.0-RC3
>>>>>>
>>>>>> In short, when I use large graphs with the Shortest Paths example, it
>>>>>> fails.  But when I use the small graph provided on the Quick Start guide,
>>>>>> it succeeds.
>>>>>> I converted all of my large graphs into the format shown in the Quick
>>>>>> Start guide to simply things.
>>>>>> I'm using a one-node setup.
>>>>>>
>>>>>>
>>>>>> Here is the command I'm using:
>>>>>> hadoop jar
>>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>>>
>>>>>> org.apache.giraph.GiraphRunner
>>>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>> -vif
>>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>>>  -op /user/ubuntu/output/shortestpaths
>>>>>> -w 1
>>>>>>
>>>>>> (all on one line)
>>>>>>
>>>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>>>
>>>>>> The job fails in 10mins, 46sec.
>>>>>>
>>>>>> Two Map tasks are created when run.
>>>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>>>
>>>>>> ===========================================================================
>>>>>>
>>>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>>>>> java.net.ConnectException: Connection refused
>>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>>>> ===========================================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>>>> state.  syslog:
>>>>>>
>>>>>> ===========================================================================
>>>>>>
>>>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>>> 	... 7 more
>>>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>>>> 	... 15 more
>>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>>>>
>>>>>>
>>>>>> ===========================================================================
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml
>>>>>> by adding this:
>>>>>>   <property>
>>>>>>     <name>mapred.child.java.opts</name>
>>>>>>     <value>-Xmx1024m</value>
>>>>>>   </property>
>>>>>>
>>>>>> But that just caused the entire job to fail from the start.
>>>>>>
>>>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>>>> those versions provide me with more and different errors to debug that
>>>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>>>> using seems to be the best for me because these errors seem more manageable.
>>>>>>
>>>>>> What can I do to fix this error?  I thought Giraph was built for
>>>>>> large scale graph processing so I suppose this problem was encountered
>>>>>> before by someone testing large graphs.  I searched through the mailing
>>>>>> archives and couldn't find anything though.
>>>>>>
>>>>>> I can provide more information if you need it.  Thanks a lot.
>>>>>>
>>>>>> Bryan Rowe
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Then Young Han replied:*
>>>>>>
>>>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>>>>>> your graph's format is correct? For the Json input formats, each line of
>>>>>> your file should be:
>>>>>>
>>>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>>>> weigth], ..., [dst id, edge weight]]]
>>>>>>
>>>>>> You can set the vertex value for every vertex to 0 (SSSP will
>>>>>> overwrite that value in the first superstep). If your graph doesn't have
>>>>>> edge weights, you can just set them to 1.
>>>>>>
>>>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>>>
>>>>>> Young
>>>>>>
>>>>>>
>>>>>> *Then I replied:*
>>>>>> Hi Young,
>>>>>>
>>>>>> I believe my graph has the correct format.
>>>>>> Here is the first 40 lines of the graph I'm using:
>>>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>>>> [5,0,[[0,1]]]
>>>>>> [6,0,[[2,1],[12,1]]]
>>>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>>>> [9,0,[[3,1]]]
>>>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>>>> [18,0,[[24,1],[14,1]]]
>>>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>>>> [22,0,[[16,1],[19,1]]]
>>>>>> [23,0,[[17,1],[105,1]]]
>>>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>>>> [31,0,[[34,1],[29,1]]]
>>>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>>>> [33,0,[[30,1]]]
>>>>>> [34,0,[[38,1],[31,1]]]
>>>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>>>> [37,0,[[41,1],[35,1]]]
>>>>>> [38,0,[[10,1],[34,1]]]
>>>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>>>> [40,0,[[90,1],[36,1]]]
>>>>>>
>>>>>> Also, sorry about sending the email twice.  My email client messed up.
>>>>>>
>>>>>> Thanks,
>>>>>> Bryan
>>>>>>
>>>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>>>> different error messages I believe.
>>>>>>
>>>>>> Bryan
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Errors with large graphs (continued, new email)

Posted by Young Han <yo...@uwaterloo.ca>.
Ahh yes, that's another common error. Try adding this to your mapred-site:

<property>
  <name>mapreduce.job.counters.max</name>
  <value>1000000</value>
</property>
<property>
  <name>mapreduce.job.counters.limit</name>
  <value>1000000</value>
</property>

Young


On Thu, Jul 3, 2014 at 5:49 PM, Bryan Rowe <br...@gmail.com> wrote:

> I just changed my mapred-site.xml file to increase the heap size:
> <configuration>
> <property>
> <name>mapred.job.tracker</name>
> <value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311
> </value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>4</value>
> </property>
>
> <property>
> <name>mapred.map.tasks</name>
> <value>4</value>
> </property>
>
>
>
>
>
> *<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
> </configuration>
>
>
> I ran a Hadoop job and it seemed to be working for the first 2 minutes
> because it was computing supersteps.  However after 2 minutes it failed.
> Part of the reason is because the Haddop counters exceeded 120.  There were
> other errors though and I'm not sure if those are related to the counters
> problem.
>
> Here is an image album of the errors:  http://imgur.com/a/mAqn3
> Also, here is a link to the Hadoop job that failed:
> http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30
>
> I'll keep my instance up and running so the link should work.
> What would you recommend next?
>
> Thanks,
> Bryan
>
>
> On Thu, Jul 3, 2014 at 2:19 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> What sort of error does it give? Did you try restarting Hadoop after
>> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>>
>> Young
>>
>>
>> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <br...@gmail.com> wrote:
>>
>>> Yes, before conversion they were also sorted by vertex ID.
>>> 0       1       1
>>> 0       4       1
>>> 0       5       1
>>> 1       0       1
>>> 1       2       1
>>> 1       3       1
>>> 2       1       1
>>> 2       6       1
>>> 2       7       1
>>> 3       8       1
>>>
>>> That was the format before conversion with format [source destination
>>> weight] of edges.
>>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15
>>> Gi of memory according to Amazon's website.
>>>
>>> Any suggestions on tweaking Xmx?  I tried doing that without any luck,
>>> it just didn't run the map phase then failed.
>>>
>>>
>>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca>
>>> wrote:
>>>
>>>> From the other thread...
>>>>
>>>> Yeah, your input format looks correct. Did you have the graph sorted by
>>>> source vertex IDs before conversion? (I'm not sure if duplicate entries
>>>> with the same source ID matters, but just in case.)
>>>>
>>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>>> type of EC2 instances are you using? You probably want to use something
>>>> larger than t1.micro or m1.small.
>>>>
>>>> Young
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com>
>>>> wrote:
>>>>
>>>>> First of all, I started this email thread with my old email from
>>>>> Yahoo, which was a mistake because it kept sending out duplicates.  Sorry
>>>>> for the inconvenience, but I'll continue it using this email thread from
>>>>> now on.
>>>>>
>>>>> *I originally posted this:*
>>>>> Hello,
>>>>>
>>>>> Giraph: release-1.0.0-RC3
>>>>>
>>>>> In short, when I use large graphs with the Shortest Paths example, it
>>>>> fails.  But when I use the small graph provided on the Quick Start guide,
>>>>> it succeeds.
>>>>> I converted all of my large graphs into the format shown in the Quick
>>>>> Start guide to simply things.
>>>>> I'm using a one-node setup.
>>>>>
>>>>>
>>>>> Here is the command I'm using:
>>>>> hadoop jar
>>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>>
>>>>> org.apache.giraph.GiraphRunner
>>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> -vif
>>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>>  -op /user/ubuntu/output/shortestpaths
>>>>> -w 1
>>>>>
>>>>> (all on one line)
>>>>>
>>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>>
>>>>> The job fails in 10mins, 46sec.
>>>>>
>>>>> Two Map tasks are created when run.
>>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>>> ===========================================================================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>>> state.  syslog:
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>> 	... 7 more
>>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>>> 	... 15 more
>>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>>>
>>>>>
>>>>> ===========================================================================
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml
>>>>> by adding this:
>>>>>   <property>
>>>>>     <name>mapred.child.java.opts</name>
>>>>>     <value>-Xmx1024m</value>
>>>>>   </property>
>>>>>
>>>>> But that just caused the entire job to fail from the start.
>>>>>
>>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>>> those versions provide me with more and different errors to debug that
>>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>>> using seems to be the best for me because these errors seem more manageable.
>>>>>
>>>>> What can I do to fix this error?  I thought Giraph was built for large
>>>>> scale graph processing so I suppose this problem was encountered before by
>>>>> someone testing large graphs.  I searched through the mailing archives and
>>>>> couldn't find anything though.
>>>>>
>>>>> I can provide more information if you need it.  Thanks a lot.
>>>>>
>>>>> Bryan Rowe
>>>>>
>>>>>
>>>>>
>>>>> *Then Young Han replied:*
>>>>>
>>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>>>>> your graph's format is correct? For the Json input formats, each line of
>>>>> your file should be:
>>>>>
>>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>>> weigth], ..., [dst id, edge weight]]]
>>>>>
>>>>> You can set the vertex value for every vertex to 0 (SSSP will
>>>>> overwrite that value in the first superstep). If your graph doesn't have
>>>>> edge weights, you can just set them to 1.
>>>>>
>>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>>
>>>>> Young
>>>>>
>>>>>
>>>>> *Then I replied:*
>>>>> Hi Young,
>>>>>
>>>>> I believe my graph has the correct format.
>>>>> Here is the first 40 lines of the graph I'm using:
>>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>>> [5,0,[[0,1]]]
>>>>> [6,0,[[2,1],[12,1]]]
>>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>>> [9,0,[[3,1]]]
>>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>>> [18,0,[[24,1],[14,1]]]
>>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>>> [22,0,[[16,1],[19,1]]]
>>>>> [23,0,[[17,1],[105,1]]]
>>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>>> [31,0,[[34,1],[29,1]]]
>>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>>> [33,0,[[30,1]]]
>>>>> [34,0,[[38,1],[31,1]]]
>>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>>> [37,0,[[41,1],[35,1]]]
>>>>> [38,0,[[10,1],[34,1]]]
>>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>>> [40,0,[[90,1],[36,1]]]
>>>>>
>>>>> Also, sorry about sending the email twice.  My email client messed up.
>>>>>
>>>>> Thanks,
>>>>> Bryan
>>>>>
>>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>>> different error messages I believe.
>>>>>
>>>>> Bryan
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Errors with large graphs (continued, new email)

Posted by Bryan Rowe <br...@gmail.com>.
I just changed my mapred-site.xml file to increase the heap size:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54311</value>
</property>

<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>

<property>
<name>mapred.map.tasks</name>
<value>4</value>
</property>




*<property><name>mapred.child.java.opts</name><value>-Xmx1024m</value></property>*
</configuration>


I ran a Hadoop job and it seemed to be working for the first 2 minutes
because it was computing supersteps.  However after 2 minutes it failed.
Part of the reason is because the Haddop counters exceeded 120.  There were
other errors though and I'm not sure if those are related to the counters
problem.

Here is an image album of the errors:  http://imgur.com/a/mAqn3
Also, here is a link to the Hadoop job that failed:
http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:50030/jobdetails.jsp?jobid=job_201407032123_0002&refresh=30

I'll keep my instance up and running so the link should work.
What would you recommend next?

Thanks,
Bryan


On Thu, Jul 3, 2014 at 2:19 PM, Young Han <yo...@uwaterloo.ca> wrote:

> What sort of error does it give? Did you try restarting Hadoop after
> setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).
>
> Young
>
>
> On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <br...@gmail.com> wrote:
>
>> Yes, before conversion they were also sorted by vertex ID.
>> 0       1       1
>> 0       4       1
>> 0       5       1
>> 1       0       1
>> 1       2       1
>> 1       3       1
>> 2       1       1
>> 2       6       1
>> 2       7       1
>> 3       8       1
>>
>> That was the format before conversion with format [source destination
>> weight] of edges.
>> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15
>> Gi of memory according to Amazon's website.
>>
>> Any suggestions on tweaking Xmx?  I tried doing that without any luck, it
>> just didn't run the map phase then failed.
>>
>>
>> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca> wrote:
>>
>>> From the other thread...
>>>
>>> Yeah, your input format looks correct. Did you have the graph sorted by
>>> source vertex IDs before conversion? (I'm not sure if duplicate entries
>>> with the same source ID matters, but just in case.)
>>>
>>> They're all out of memory errors, so I think Xmx is the culprit. What
>>> type of EC2 instances are you using? You probably want to use something
>>> larger than t1.micro or m1.small.
>>>
>>> Young
>>>
>>>
>>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com>
>>> wrote:
>>>
>>>> First of all, I started this email thread with my old email from Yahoo,
>>>> which was a mistake because it kept sending out duplicates.  Sorry for the
>>>> inconvenience, but I'll continue it using this email thread from now on.
>>>>
>>>> *I originally posted this:*
>>>> Hello,
>>>>
>>>> Giraph: release-1.0.0-RC3
>>>>
>>>> In short, when I use large graphs with the Shortest Paths example, it
>>>> fails.  But when I use the small graph provided on the Quick Start guide,
>>>> it succeeds.
>>>> I converted all of my large graphs into the format shown in the Quick
>>>> Start guide to simply things.
>>>> I'm using a one-node setup.
>>>>
>>>>
>>>> Here is the command I'm using:
>>>> hadoop jar
>>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>>
>>>> org.apache.giraph.GiraphRunner
>>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>>> -vif
>>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>>  -vip /user/ubuntu/input/CA.txt -of
>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>>  -op /user/ubuntu/output/shortestpaths
>>>> -w 1
>>>>
>>>> (all on one line)
>>>>
>>>> CA.txt is a large graph file: 96,026,228 bytes
>>>>
>>>> The job fails in 10mins, 46sec.
>>>>
>>>> Two Map tasks are created when run.
>>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>>
>>>> ===========================================================================
>>>>
>>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>>> java.net.ConnectException: Connection refused
>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> 	at java.net.Socket.connect(Socket.java:579)
>>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>>> ===========================================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>>> state.  syslog:
>>>>
>>>> ===========================================================================
>>>>
>>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>> 	... 7 more
>>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>>> 	... 15 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>>
>>>>
>>>> ===========================================================================
>>>>
>>>>
>>>>
>>>>
>>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by
>>>> adding this:
>>>>   <property>
>>>>     <name>mapred.child.java.opts</name>
>>>>     <value>-Xmx1024m</value>
>>>>   </property>
>>>>
>>>> But that just caused the entire job to fail from the start.
>>>>
>>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>>> those versions provide me with more and different errors to debug that
>>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>>> using seems to be the best for me because these errors seem more manageable.
>>>>
>>>> What can I do to fix this error?  I thought Giraph was built for large
>>>> scale graph processing so I suppose this problem was encountered before by
>>>> someone testing large graphs.  I searched through the mailing archives and
>>>> couldn't find anything though.
>>>>
>>>> I can provide more information if you need it.  Thanks a lot.
>>>>
>>>> Bryan Rowe
>>>>
>>>>
>>>>
>>>> *Then Young Han replied:*
>>>>
>>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>>>> your graph's format is correct? For the Json input formats, each line of
>>>> your file should be:
>>>>
>>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge
>>>> weigth], ..., [dst id, edge weight]]]
>>>>
>>>> You can set the vertex value for every vertex to 0 (SSSP will overwrite
>>>> that value in the first superstep). If your graph doesn't have edge
>>>> weights, you can just set them to 1.
>>>>
>>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>>
>>>> Young
>>>>
>>>>
>>>> *Then I replied:*
>>>> Hi Young,
>>>>
>>>> I believe my graph has the correct format.
>>>> Here is the first 40 lines of the graph I'm using:
>>>> [0,0,[[1,1],[4,1],[5,1]]]
>>>> [1,0,[[0,1],[2,1],[3,1]]]
>>>> [2,0,[[1,1],[6,1],[7,1]]]
>>>> [3,0,[[8,1],[1,1],[9,1]]]
>>>> [4,0,[[0,1],[10,1],[11,1]]]
>>>> [5,0,[[0,1]]]
>>>> [6,0,[[2,1],[12,1]]]
>>>> [7,0,[[2,1],[12,1],[13,1]]]
>>>> [8,0,[[11,1],[3,1],[60,1]]]
>>>> [9,0,[[3,1]]]
>>>> [10,0,[[35,1],[4,1],[38,1]]]
>>>> [11,0,[[8,1],[59,1],[4,1]]]
>>>> [12,0,[[41,1],[6,1],[7,1]]]
>>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>>> [14,0,[[18,1],[19,1],[15,1]]]
>>>> [15,0,[[16,1],[17,1],[14,1]]]
>>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>>> [18,0,[[24,1],[14,1]]]
>>>> [19,0,[[25,1],[22,1],[14,1]]]
>>>> [20,0,[[16,1],[25,1],[26,1]]]
>>>> [21,0,[[16,1],[17,1],[30,1]]]
>>>> [22,0,[[16,1],[19,1]]]
>>>> [23,0,[[17,1],[105,1]]]
>>>> [24,0,[[17,1],[18,1],[58,1]]]
>>>> [25,0,[[27,1],[19,1],[20,1]]]
>>>> [26,0,[[28,1],[27,1],[20,1]]]
>>>> [27,0,[[25,1],[26,1],[29,1]]]
>>>> [28,0,[[26,1],[29,1],[30,1]]]
>>>> [29,0,[[27,1],[28,1],[31,1]]]
>>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>>> [31,0,[[34,1],[29,1]]]
>>>> [32,0,[[105,1],[30,1],[39,1]]]
>>>> [33,0,[[30,1]]]
>>>> [34,0,[[38,1],[31,1]]]
>>>> [35,0,[[10,1],[36,1],[37,1]]]
>>>> [36,0,[[40,1],[35,1],[39,1]]]
>>>> [37,0,[[41,1],[35,1]]]
>>>> [38,0,[[10,1],[34,1]]]
>>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>>> [40,0,[[90,1],[36,1]]]
>>>>
>>>> Also, sorry about sending the email twice.  My email client messed up.
>>>>
>>>> Thanks,
>>>> Bryan
>>>>
>>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>>> different error messages I believe.
>>>>
>>>> Bryan
>>>>
>>>
>>>
>>
>

Re: Errors with large graphs (continued, new email)

Posted by Young Han <yo...@uwaterloo.ca>.
What sort of error does it give? Did you try restarting Hadoop after
setting it? (stop-all.sh, start-all.sh, hadoop dfs -safemode wait).

Young


On Thu, Jul 3, 2014 at 5:08 PM, Bryan Rowe <br...@gmail.com> wrote:

> Yes, before conversion they were also sorted by vertex ID.
> 0       1       1
> 0       4       1
> 0       5       1
> 1       0       1
> 1       2       1
> 1       3       1
> 2       1       1
> 2       6       1
> 2       7       1
> 3       8       1
>
> That was the format before conversion with format [source destination
> weight] of edges.
> I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15 Gi
> of memory according to Amazon's website.
>
> Any suggestions on tweaking Xmx?  I tried doing that without any luck, it
> just didn't run the map phase then failed.
>
>
> On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca> wrote:
>
>> From the other thread...
>>
>> Yeah, your input format looks correct. Did you have the graph sorted by
>> source vertex IDs before conversion? (I'm not sure if duplicate entries
>> with the same source ID matters, but just in case.)
>>
>> They're all out of memory errors, so I think Xmx is the culprit. What
>> type of EC2 instances are you using? You probably want to use something
>> larger than t1.micro or m1.small.
>>
>> Young
>>
>>
>> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com> wrote:
>>
>>> First of all, I started this email thread with my old email from Yahoo,
>>> which was a mistake because it kept sending out duplicates.  Sorry for the
>>> inconvenience, but I'll continue it using this email thread from now on.
>>>
>>> *I originally posted this:*
>>> Hello,
>>>
>>> Giraph: release-1.0.0-RC3
>>>
>>> In short, when I use large graphs with the Shortest Paths example, it
>>> fails.  But when I use the small graph provided on the Quick Start guide,
>>> it succeeds.
>>> I converted all of my large graphs into the format shown in the Quick
>>> Start guide to simply things.
>>> I'm using a one-node setup.
>>>
>>>
>>> Here is the command I'm using:
>>> hadoop jar
>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>>
>>> org.apache.giraph.GiraphRunner
>>> org.apache.giraph.examples.SimpleShortestPathsVertex
>>> -vif
>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>>  -vip /user/ubuntu/input/CA.txt -of
>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>>  -op /user/ubuntu/output/shortestpaths
>>> -w 1
>>>
>>> (all on one line)
>>>
>>> CA.txt is a large graph file: 96,026,228 bytes
>>>
>>> The job fails in 10mins, 46sec.
>>>
>>> Two Map tasks are created when run.
>>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>>
>>> ===========================================================================
>>>
>>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>>> java.net.ConnectException: Connection refused
>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> 	at java.net.Socket.connect(Socket.java:579)
>>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>>> ===========================================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>>> state.  syslog:
>>>
>>> ===========================================================================
>>>
>>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>> 	... 7 more
>>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>>> 	... 15 more
>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>>
>>>
>>> ===========================================================================
>>>
>>>
>>>
>>>
>>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by
>>> adding this:
>>>   <property>
>>>     <name>mapred.child.java.opts</name>
>>>     <value>-Xmx1024m</value>
>>>   </property>
>>>
>>> But that just caused the entire job to fail from the start.
>>>
>>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and
>>> those versions provide me with more and different errors to debug that
>>> relate to problems with Hadoop itself.  So the Giraph version I'm currently
>>> using seems to be the best for me because these errors seem more manageable.
>>>
>>> What can I do to fix this error?  I thought Giraph was built for large
>>> scale graph processing so I suppose this problem was encountered before by
>>> someone testing large graphs.  I searched through the mailing archives and
>>> couldn't find anything though.
>>>
>>> I can provide more information if you need it.  Thanks a lot.
>>>
>>> Bryan Rowe
>>>
>>>
>>>
>>> *Then Young Han replied:*
>>>
>>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>>> your graph's format is correct? For the Json input formats, each line of
>>> your file should be:
>>>
>>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge weigth],
>>> ..., [dst id, edge weight]]]
>>>
>>> You can set the vertex value for every vertex to 0 (SSSP will overwrite
>>> that value in the first superstep). If your graph doesn't have edge
>>> weights, you can just set them to 1.
>>>
>>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>>
>>> Young
>>>
>>>
>>> *Then I replied:*
>>> Hi Young,
>>>
>>> I believe my graph has the correct format.
>>> Here is the first 40 lines of the graph I'm using:
>>> [0,0,[[1,1],[4,1],[5,1]]]
>>> [1,0,[[0,1],[2,1],[3,1]]]
>>> [2,0,[[1,1],[6,1],[7,1]]]
>>> [3,0,[[8,1],[1,1],[9,1]]]
>>> [4,0,[[0,1],[10,1],[11,1]]]
>>> [5,0,[[0,1]]]
>>> [6,0,[[2,1],[12,1]]]
>>> [7,0,[[2,1],[12,1],[13,1]]]
>>> [8,0,[[11,1],[3,1],[60,1]]]
>>> [9,0,[[3,1]]]
>>> [10,0,[[35,1],[4,1],[38,1]]]
>>> [11,0,[[8,1],[59,1],[4,1]]]
>>> [12,0,[[41,1],[6,1],[7,1]]]
>>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>>> [14,0,[[18,1],[19,1],[15,1]]]
>>> [15,0,[[16,1],[17,1],[14,1]]]
>>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>>> [18,0,[[24,1],[14,1]]]
>>> [19,0,[[25,1],[22,1],[14,1]]]
>>> [20,0,[[16,1],[25,1],[26,1]]]
>>> [21,0,[[16,1],[17,1],[30,1]]]
>>> [22,0,[[16,1],[19,1]]]
>>> [23,0,[[17,1],[105,1]]]
>>> [24,0,[[17,1],[18,1],[58,1]]]
>>> [25,0,[[27,1],[19,1],[20,1]]]
>>> [26,0,[[28,1],[27,1],[20,1]]]
>>> [27,0,[[25,1],[26,1],[29,1]]]
>>> [28,0,[[26,1],[29,1],[30,1]]]
>>> [29,0,[[27,1],[28,1],[31,1]]]
>>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>>> [31,0,[[34,1],[29,1]]]
>>> [32,0,[[105,1],[30,1],[39,1]]]
>>> [33,0,[[30,1]]]
>>> [34,0,[[38,1],[31,1]]]
>>> [35,0,[[10,1],[36,1],[37,1]]]
>>> [36,0,[[40,1],[35,1],[39,1]]]
>>> [37,0,[[41,1],[35,1]]]
>>> [38,0,[[10,1],[34,1]]]
>>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>>> [40,0,[[90,1],[36,1]]]
>>>
>>> Also, sorry about sending the email twice.  My email client messed up.
>>>
>>> Thanks,
>>> Bryan
>>>
>>> I've tried a larger graph.  It ran for 2 hours then failed with
>>> different error messages I believe.
>>>
>>> Bryan
>>>
>>
>>
>

Re: Errors with large graphs (continued, new email)

Posted by Bryan Rowe <br...@gmail.com>.
Yes, before conversion they were also sorted by vertex ID.
0       1       1
0       4       1
0       5       1
1       0       1
1       2       1
1       3       1
2       1       1
2       6       1
2       7       1
3       8       1

That was the format before conversion with format [source destination
weight] of edges.
I'm using a medium-tier EC2 instance c3.2xlarge.  It has 8 vCPUs and 15 Gi
of memory according to Amazon's website.

Any suggestions on tweaking Xmx?  I tried doing that without any luck, it
just didn't run the map phase then failed.


On Thu, Jul 3, 2014 at 1:57 PM, Young Han <yo...@uwaterloo.ca> wrote:

> From the other thread...
>
> Yeah, your input format looks correct. Did you have the graph sorted by
> source vertex IDs before conversion? (I'm not sure if duplicate entries
> with the same source ID matters, but just in case.)
>
> They're all out of memory errors, so I think Xmx is the culprit. What type
> of EC2 instances are you using? You probably want to use something larger
> than t1.micro or m1.small.
>
> Young
>
>
> On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com> wrote:
>
>> First of all, I started this email thread with my old email from Yahoo,
>> which was a mistake because it kept sending out duplicates.  Sorry for the
>> inconvenience, but I'll continue it using this email thread from now on.
>>
>> *I originally posted this:*
>> Hello,
>>
>> Giraph: release-1.0.0-RC3
>>
>> In short, when I use large graphs with the Shortest Paths example, it
>> fails.  But when I use the small graph provided on the Quick Start guide,
>> it succeeds.
>> I converted all of my large graphs into the format shown in the Quick
>> Start guide to simply things.
>> I'm using a one-node setup.
>>
>>
>> Here is the command I'm using:
>> hadoop jar
>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>>
>> org.apache.giraph.GiraphRunner
>> org.apache.giraph.examples.SimpleShortestPathsVertex
>> -vif
>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>>  -vip /user/ubuntu/input/CA.txt -of
>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>>  -op /user/ubuntu/output/shortestpaths
>> -w 1
>>
>> (all on one line)
>>
>> CA.txt is a large graph file: 96,026,228 bytes
>>
>> The job fails in 10mins, 46sec.
>>
>> Two Map tasks are created when run.
>> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
>>
>> ===========================================================================
>>
>> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
>> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
>> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
>> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
>> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
>> java.net.ConnectException: Connection refused
>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> 	at java.net.Socket.connect(Socket.java:579)
>> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
>> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
>> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
>> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
>> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
>> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
>> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
>> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
>> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
>> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
>> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
>> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
>> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
>> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
>> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
>> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
>> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
>> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
>> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
>> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
>> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
>> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
>> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
>> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
>> ===========================================================================
>>
>>
>>
>>
>>
>>
>>
>>
>> The second one, task_201407021636_0006_m_000001, goes to the FAILED
>> state.  syslog:
>>
>> ===========================================================================
>>
>> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
>> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
>> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
>> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
>> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
>> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
>> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
>> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
>> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
>> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
>> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
>> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
>> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
>> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
>> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
>> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
>> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
>> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
>> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
>> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
>> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
>> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
>> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
>> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
>> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
>> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
>> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
>> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
>> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
>> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
>> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
>> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
>> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
>> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
>> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
>> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
>> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
>> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
>> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
>> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
>> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
>> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
>> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
>> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
>> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
>> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
>> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
>> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
>> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
>> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
>> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>> 	... 7 more
>> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
>> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
>> 	... 15 more
>> Caused by: java.lang.OutOfMemoryError: Java heap space
>> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
>> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
>> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
>> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
>> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
>> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
>> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
>> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> 	at java.lang.Thread.run(Thread.java:745)
>> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>>
>>
>> ===========================================================================
>>
>>
>>
>>
>> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by
>> adding this:
>>   <property>
>>     <name>mapred.child.java.opts</name>
>>     <value>-Xmx1024m</value>
>>   </property>
>>
>> But that just caused the entire job to fail from the start.
>>
>> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and those
>> versions provide me with more and different errors to debug that relate to
>> problems with Hadoop itself.  So the Giraph version I'm currently using
>> seems to be the best for me because these errors seem more manageable.
>>
>> What can I do to fix this error?  I thought Giraph was built for large
>> scale graph processing so I suppose this problem was encountered before by
>> someone testing large graphs.  I searched through the mailing archives and
>> couldn't find anything though.
>>
>> I can provide more information if you need it.  Thanks a lot.
>>
>> Bryan Rowe
>>
>>
>>
>> *Then Young Han replied:*
>>
>> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure
>> your graph's format is correct? For the Json input formats, each line of
>> your file should be:
>>
>> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge weigth],
>> ..., [dst id, edge weight]]]
>>
>> You can set the vertex value for every vertex to 0 (SSSP will overwrite
>> that value in the first superstep). If your graph doesn't have edge
>> weights, you can just set them to 1.
>>
>> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>>
>> Young
>>
>>
>> *Then I replied:*
>> Hi Young,
>>
>> I believe my graph has the correct format.
>> Here is the first 40 lines of the graph I'm using:
>> [0,0,[[1,1],[4,1],[5,1]]]
>> [1,0,[[0,1],[2,1],[3,1]]]
>> [2,0,[[1,1],[6,1],[7,1]]]
>> [3,0,[[8,1],[1,1],[9,1]]]
>> [4,0,[[0,1],[10,1],[11,1]]]
>> [5,0,[[0,1]]]
>> [6,0,[[2,1],[12,1]]]
>> [7,0,[[2,1],[12,1],[13,1]]]
>> [8,0,[[11,1],[3,1],[60,1]]]
>> [9,0,[[3,1]]]
>> [10,0,[[35,1],[4,1],[38,1]]]
>> [11,0,[[8,1],[59,1],[4,1]]]
>> [12,0,[[41,1],[6,1],[7,1]]]
>> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
>> [14,0,[[18,1],[19,1],[15,1]]]
>> [15,0,[[16,1],[17,1],[14,1]]]
>> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
>> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
>> [18,0,[[24,1],[14,1]]]
>> [19,0,[[25,1],[22,1],[14,1]]]
>> [20,0,[[16,1],[25,1],[26,1]]]
>> [21,0,[[16,1],[17,1],[30,1]]]
>> [22,0,[[16,1],[19,1]]]
>> [23,0,[[17,1],[105,1]]]
>> [24,0,[[17,1],[18,1],[58,1]]]
>> [25,0,[[27,1],[19,1],[20,1]]]
>> [26,0,[[28,1],[27,1],[20,1]]]
>> [27,0,[[25,1],[26,1],[29,1]]]
>> [28,0,[[26,1],[29,1],[30,1]]]
>> [29,0,[[27,1],[28,1],[31,1]]]
>> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
>> [31,0,[[34,1],[29,1]]]
>> [32,0,[[105,1],[30,1],[39,1]]]
>> [33,0,[[30,1]]]
>> [34,0,[[38,1],[31,1]]]
>> [35,0,[[10,1],[36,1],[37,1]]]
>> [36,0,[[40,1],[35,1],[39,1]]]
>> [37,0,[[41,1],[35,1]]]
>> [38,0,[[10,1],[34,1]]]
>> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
>> [40,0,[[90,1],[36,1]]]
>>
>> Also, sorry about sending the email twice.  My email client messed up.
>>
>> Thanks,
>> Bryan
>>
>> I've tried a larger graph.  It ran for 2 hours then failed with different
>> error messages I believe.
>>
>> Bryan
>>
>
>

Re: Errors with large graphs (continued, new email)

Posted by Young Han <yo...@uwaterloo.ca>.
>From the other thread...

Yeah, your input format looks correct. Did you have the graph sorted by
source vertex IDs before conversion? (I'm not sure if duplicate entries
with the same source ID matters, but just in case.)

They're all out of memory errors, so I think Xmx is the culprit. What type
of EC2 instances are you using? You probably want to use something larger
than t1.micro or m1.small.

Young


On Thu, Jul 3, 2014 at 4:53 PM, Bryan Rowe <br...@gmail.com> wrote:

> First of all, I started this email thread with my old email from Yahoo,
> which was a mistake because it kept sending out duplicates.  Sorry for the
> inconvenience, but I'll continue it using this email thread from now on.
>
> *I originally posted this:*
> Hello,
>
> Giraph: release-1.0.0-RC3
>
> In short, when I use large graphs with the Shortest Paths example, it
> fails.  But when I use the small graph provided on the Quick Start guide,
> it succeeds.
> I converted all of my large graphs into the format shown in the Quick
> Start guide to simply things.
> I'm using a one-node setup.
>
>
> Here is the command I'm using:
> hadoop jar
> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>
> org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.SimpleShortestPathsVertex
> -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
>  -vip /user/ubuntu/input/CA.txt -of
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat
>  -op /user/ubuntu/output/shortestpaths
> -w 1
>
> (all on one line)
>
> CA.txt is a large graph file: 96,026,228 bytes
>
> The job fails in 10mins, 46sec.
>
> Two Map tasks are created when run.
> The first one, task_201407021636_0006_m_000000, is KILLED.  syslog:
> ===========================================================================
>
> 2014-07-02 17:01:34,757 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
> 2014-07-02 17:01:34,945 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
> 2014-07-02 17:01:35,127 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
> 2014-07-02 17:01:35,159 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
> 2014-07-02 17:01:35,201 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
> 2014-07-02 17:01:35,204 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
> 2014-07-02 17:01:35,219 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 candidates when 1 required (polling period is 3000) on attempt 0
> 2014-07-02 17:01:35,220 INFO org.apache.giraph.zk.ZooKeeperManager: createZooKeeperServerList: Creating the final ZooKeeper file '_bsp/_defaultZkManagerDir/job_201407021636_0006/zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 0, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
> 2014-07-02 17:01:35,228 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
> 2014-07-02 17:01:35,229 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Trying to delete old directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Creating file /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg in /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper with base port 22181
> 2014-07-02 17:01:35,234 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
> 2014-07-02 17:01:35,235 INFO org.apache.giraph.zk.ZooKeeperManager: generateZooKeeperConfigFile: Delete of zoo.cfg = false
> 2014-07-02 17:01:35,236 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Attempting to start ZooKeeper server with command [/usr/lib/jvm/java-7-oracle/jre/bin/java, -Xmx512m, -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar, org.apache.zookeeper.server.quorum.QuorumPeerMain, /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper/zoo.cfg] in directory /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/work/_bspZooKeeper
> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Shutdown hook added.
> 2014-07-02 17:01:35,238 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
> 2014-07-02 17:01:35,241 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> 	at java.net.Socket.connect(Socket.java:579)
> 	at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
> 	at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
> 	at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> 2014-07-02 17:01:38,242 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 with poll msecs = 3000
> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connected to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181!
> 2014-07-02 17:01:38,243 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_zkServer/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0
> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Chosen to run ZooKeeper...
> 2014-07-02 17:01:40,249 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceMaster (master thread)...
> 2014-07-02 17:01:40,260 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 0 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
> 2014-07-02 17:01:40,270 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client
>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work/tmp
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
> 2014-07-02 17:01:40,271 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000000_0/work
> 2014-07-02 17:01:40,272 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.master.BspServiceMaster@4deb9df0
> 2014-07-02 17:01:40,284 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:40,296 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
> 2014-07-02 17:01:42,336 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0000, negotiated timeout = 600000
> 2014-07-02 17:01:42,337 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: map: No need to do anything when not a worker
> 2014-07-02 17:01:42,343 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for MASTER_ZOOKEEPER_ONLY
> 2014-07-02 17:01:45,033 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201407021636_0006/_masterJobState)
> 2014-07-02 17:01:45,038 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000' and my bid is '/_hadoopBsp/job_201407021636_0006/_masterElectionDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_00000000000'
> 2014-07-02 17:01:45,044 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
> 2014-07-02 17:01:45,089 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
> 2014-07-02 17:01:45,111 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30000 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
> 2014-07-02 17:01:45,116 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
> 2014-07-02 17:01:45,119 INFO org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
> 2014-07-02 17:01:45,126 INFO org.apache.giraph.bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir already exists!
> 2014-07-02 17:01:45,140 INFO org.apache.giraph.io.formats.GiraphFileInputFormat: Total input paths to process : 1
> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: generateVertexInputSplits: Got 2 input splits for 1 input threads
> 2014-07-02 17:01:45,149 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
> 2014-07-02 17:01:45,163 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
> 2014-07-02 17:01:45,173 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
> 2014-07-02 17:01:45,187 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
> 2014-07-02 17:01:45,188 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
> 2014-07-02 17:01:45,350 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
> 2014-07-02 17:06:45,347 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
> 2014-07-02 17:06:45,349 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDoneDir
> 2014-07-02 17:11:45,355 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1]
> 2014-07-02 17:12:07,201 INFO org.apache.giraph.zk.ZooKeeperManager: run: Shutdown hook started.
> 2014-07-02 17:12:07,202 WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper process.
> 2014-07-02 17:12:07,518 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x146f806378d0000, likely server has closed socket, closing socket connection and attempting reconnect
> 2014-07-02 17:12:07,518 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: ZooKeeper process exited with 143 (note that 143 typically means killed).
> ===========================================================================
>
>
>
>
>
>
>
>
> The second one, task_201407021636_0006_m_000001, goes to the FAILED
> state.  syslog:
> ===========================================================================
>
> 2014-07-02 17:01:38,016 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
> 2014-07-02 17:01:38,203 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
> 2014-07-02 17:01:38,379 INFO org.apache.giraph.graph.GraphTaskManager: setup: Log level remains at info
> 2014-07-02 17:01:40,280 INFO org.apache.giraph.graph.GraphTaskManager: Distributed cache is empty. Assuming fatjar.
> 2014-07-02 17:01:40,281 INFO org.apache.giraph.graph.GraphTaskManager: setup: classpath @ /home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/job.jar for job Giraph: org.apache.giraph.examples.SimpleShortestPathsVertex
> 2014-07-02 17:01:40,317 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201407021636_0006
> 2014-07-02 17:01:40,320 INFO org.apache.giraph.zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201407021636_0006/_task/ec2-54-186-5-159.us-west-2.compute.amazonaws.com 1
> 2014-07-02 17:01:42,109 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 ' (polling period is 3000)
> 2014-07-02 17:01:42,313 INFO org.apache.giraph.zk.ZooKeeperManager: getZooKeeperServerList: Found [ec2-54-186-5-159.us-west-2.compute.amazonaws.com, 0] 2 hosts in filename 'zkServerList_ec2-54-186-5-159.us-west-2.compute.amazonaws.com 0 '
> 2014-07-02 17:01:42,316 INFO org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got [ec2-54-186-5-159.us-west-2.compute.amazonaws.com] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
> 2014-07-02 17:01:42,316 INFO org.apache.giraph.graph.GraphTaskManager: setup: Starting up BspServiceWorker...
> 2014-07-02 17:01:42,330 INFO org.apache.giraph.bsp.BspService: BspService: Connecting to ZooKeeper with job job_201407021636_0006, 1 on ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181
> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ec2-54-186-5-159.us-west-2.compute.amazonaws.com
> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
> 2014-07-02 17:01:42,337 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client
>  environment:java.class.path=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars/classes:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/jars:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work:/home/ubuntu/hadoop/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ubuntu/hadoop/bin/..:/home/ubuntu/hadoop/bin/../hadoop-core-0.20.203.0.jar:/home/ubuntu/hadoop/bin/../lib/aspectjrt-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/aspectjtools-1.6.5.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-1.7.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-cli-1.2.jar:/home/ubuntu/hadoop/bin/../lib/commons-codec-1.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-collections-3.2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-configuration-1.6.jar:/home/ubuntu/hadoo
> p/bin/../lib/commons-daemon-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-digester-1.8.jar:/home/ubuntu/hadoop/bin/../lib/commons-el-1.0.jar:/home/ubuntu/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-lang-2.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-1.1.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/ubuntu/hadoop/bin/../lib/commons-math-2.1.jar:/home/ubuntu/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/ubuntu/hadoop/bin/../lib/core-3.1.1.jar:/home/ubuntu/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/ubuntu/hadoop/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ubuntu/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/ubuntu/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/ubuntu/hadoop/bin/../lib/jetty-6.1.26.jar:/home/ubuntu/hadoop/bin/../lib/jetty-util-6.1.26.jar:/home/ubu
> ntu/hadoop/bin/../lib/jsch-0.1.42.jar:/home/ubuntu/hadoop/bin/../lib/junit-4.5.jar:/home/ubuntu/hadoop/bin/../lib/kfs-0.2.2.jar:/home/ubuntu/hadoop/bin/../lib/log4j-1.2.15.jar:/home/ubuntu/hadoop/bin/../lib/mockito-all-1.8.5.jar:/home/ubuntu/hadoop/bin/../lib/oro-2.0.8.jar:/home/ubuntu/hadoop/bin/../lib/servlet-api-2.5-20081211.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ubuntu/hadoop/bin/../lib/xmlenc-0.52.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ubuntu/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/ubuntu/hadoop/bin/../lib/native/Linux-amd64-64:/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work/tmp
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=3.2.0-58-virtual
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=ubuntu
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu
> 2014-07-02 17:01:42,338 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu/hdfstmp/mapred/local/taskTracker/ubuntu/jobcache/job_201407021636_0006/attempt_201407021636_0006_m_000001_0/work
> 2014-07-02 17:01:42,339 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ec2-54-186-5-159.us-west-2.compute.amazonaws.com:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@54a5733f
> 2014-07-02 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181
> 2014-07-02 <http://ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:221812014-07-02> 17:01:42,347 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, initiating session
> 2014-07-02 17:01:42,812 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:22181, sessionid = 0x146f806378d0001, negotiated timeout = 600000
> 2014-07-02 17:01:42,814 INFO org.apache.giraph.bsp.BspService: process: Asynchronous connection complete.
> 2014-07-02 17:01:42,819 INFO org.apache.giraph.comm.netty.NettyWorkerServer: createMessageStoreFactory: Using ByteArrayMessagesPerVertexStore since there is no combiner
> 2014-07-02 17:01:42,881 INFO org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler with 8 threads after requestFrameDecoder.
> 2014-07-02 17:01:42,904 INFO org.apache.giraph.comm.netty.NettyServer: start: Started server communication server: ec2-54-186-5-159.us-west-2.compute.amazonaws.com/172.31.45.24:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog = 1
> 2014-07-02 17:01:42,907 INFO org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler with 8 threads after requestEncoder.
> 2014-07-02 17:01:42,914 INFO org.apache.giraph.graph.GraphTaskManager: setup: Registering health of this worker...
> 2014-07-02 17:01:45,049 INFO org.apache.giraph.bsp.BspService: process: applicationAttemptChanged signaled
> 2014-07-02 17:01:45,068 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir, type=NodeChildrenChanged, state=SyncConnected)
> 2014-07-02 17:01:45,076 INFO org.apache.giraph.worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 and workerInfo= Worker(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=1, port=30001)
> 2014-07-02 17:01:45,183 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
> 2014-07-02 17:01:45,339 INFO org.apache.giraph.bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Master(hostname=ec2-54-186-5-159.us-west-2.compute.amazonaws.com, MRtaskID=0, port=30000)
> 2014-07-02 17:01:45,345 INFO org.apache.giraph.worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions
> 2014-07-02 17:01:45,346 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
> 2014-07-02 17:01:45,354 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
> 2014-07-02 17:01:45,359 INFO org.apache.giraph.worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 2 total splits.
> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxVerticesPerTransfer = 10000
> 2014-07-02 17:01:45,362 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
> 2014-07-02 17:01:45,372 INFO org.apache.giraph.worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0, overall roughly 0.0% input splits reserved
> 2014-07-02 17:01:45,373 INFO org.apache.giraph.worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/job_201407021636_0006/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://ec2-54-186-5-159.us-west-2.compute.amazonaws.com:54310/user/ubuntu/input/CA.txt:0+67108864'
> 2014-07-02 17:01:46,615 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 250000 vertices at 200412.01085983627 vertices/sec 697447 edges at 560085.1663971642 edges/sec Memory (free/total/max) = 112.94M / 182.50M / 182.50M
> 2014-07-02 17:01:47,440 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 500000 vertices at 241131.36490688394 vertices/sec 1419367 edges at 685221.3493060106 edges/sec Memory (free/total/max) = 45.07M / 187.50M / 187.50M
> 2014-07-02 17:01:51,322 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit: Loaded 750000 vertices at 125921.72750649283 vertices/sec 2149814 edges at 361077.323111284 edges/sec Memory (free/total/max) = 16.73M / 189.50M / 189.50M
> 2014-07-02 17:01:55,205 ERROR org.apache.giraph.worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/job_201407021636_0006/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/ec2-54-186-5-159.us-west-2.compute.amazonaws.com_1 on superstep -1
> 2014-07-02 17:01:55,213 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2014-07-02 17:01:55,716 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
> 2014-07-02 17:01:55,717 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName ubuntu for UID 1000 from the native implementation
> 2014-07-02 17:01:55,718 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@36df8f5
> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:151)
> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:111)
> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:73)
> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:192)
> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:276)
> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:323)
> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:230)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
> 	... 7 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:271)
> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:143)
> 	... 15 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.concurrent.ConcurrentHashMap$Segment.rehash(ConcurrentHashMap.java:501)
> 	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:460)
> 	at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1130)
> 	at org.apache.giraph.partition.SimplePartition.addPartition(SimplePartition.java:87)
> 	at org.apache.giraph.partition.SimplePartitionStore.addPartition(SimplePartitionStore.java:71)
> 	at org.apache.giraph.comm.requests.SendVertexRequest.doRequest(SendVertexRequest.java:81)
> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:470)
> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:203)
> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:267)
> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:140)
> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:220)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2014-07-02 17:01:55,722 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>
> ===========================================================================
>
>
>
>
> I've tried increasing Java Heap Space in hadoop/conf/mapred-site.xml by
> adding this:
>   <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx1024m</value>
>   </property>
>
> But that just caused the entire job to fail from the start.
>
> Before using this version of Giraph, I used 1.0.0 and 1.1.0-RC0 and those
> versions provide me with more and different errors to debug that relate to
> problems with Hadoop itself.  So the Giraph version I'm currently using
> seems to be the best for me because these errors seem more manageable.
>
> What can I do to fix this error?  I thought Giraph was built for large
> scale graph processing so I suppose this problem was encountered before by
> someone testing large graphs.  I searched through the mailing archives and
> couldn't find anything though.
>
> I can provide more information if you need it.  Thanks a lot.
>
> Bryan Rowe
>
>
>
> *Then Young Han replied:*
>
> 10 minutes seems way too long to load in 91mb from HDFS. Are you sure your
> graph's format is correct? For the Json input formats, each line of your
> file should be:
>
> [vertex id, vertex value, [[dst id, edge weight], [dst id, edge weigth],
> ..., [dst id, edge weight]]]
>
> You can set the vertex value for every vertex to 0 (SSSP will overwrite
> that value in the first superstep). If your graph doesn't have edge
> weights, you can just set them to 1.
>
> Also, have you tried a larger Xmx value? E.g., 4096m or 8192m.
>
> Young
>
>
> *Then I replied:*
> Hi Young,
>
> I believe my graph has the correct format.
> Here is the first 40 lines of the graph I'm using:
> [0,0,[[1,1],[4,1],[5,1]]]
> [1,0,[[0,1],[2,1],[3,1]]]
> [2,0,[[1,1],[6,1],[7,1]]]
> [3,0,[[8,1],[1,1],[9,1]]]
> [4,0,[[0,1],[10,1],[11,1]]]
> [5,0,[[0,1]]]
> [6,0,[[2,1],[12,1]]]
> [7,0,[[2,1],[12,1],[13,1]]]
> [8,0,[[11,1],[3,1],[60,1]]]
> [9,0,[[3,1]]]
> [10,0,[[35,1],[4,1],[38,1]]]
> [11,0,[[8,1],[59,1],[4,1]]]
> [12,0,[[41,1],[6,1],[7,1]]]
> [13,0,[[89,1],[90,1],[91,1],[7,1]]]
> [14,0,[[18,1],[19,1],[15,1]]]
> [15,0,[[16,1],[17,1],[14,1]]]
> [16,0,[[20,1],[21,1],[22,1],[15,1]]]
> [17,0,[[24,1],[23,1],[21,1],[15,1]]]
> [18,0,[[24,1],[14,1]]]
> [19,0,[[25,1],[22,1],[14,1]]]
> [20,0,[[16,1],[25,1],[26,1]]]
> [21,0,[[16,1],[17,1],[30,1]]]
> [22,0,[[16,1],[19,1]]]
> [23,0,[[17,1],[105,1]]]
> [24,0,[[17,1],[18,1],[58,1]]]
> [25,0,[[27,1],[19,1],[20,1]]]
> [26,0,[[28,1],[27,1],[20,1]]]
> [27,0,[[25,1],[26,1],[29,1]]]
> [28,0,[[26,1],[29,1],[30,1]]]
> [29,0,[[27,1],[28,1],[31,1]]]
> [30,0,[[32,1],[33,1],[28,1],[21,1]]]
> [31,0,[[34,1],[29,1]]]
> [32,0,[[105,1],[30,1],[39,1]]]
> [33,0,[[30,1]]]
> [34,0,[[38,1],[31,1]]]
> [35,0,[[10,1],[36,1],[37,1]]]
> [36,0,[[40,1],[35,1],[39,1]]]
> [37,0,[[41,1],[35,1]]]
> [38,0,[[10,1],[34,1]]]
> [39,0,[[32,1],[58,1],[36,1],[119,1]]]
> [40,0,[[90,1],[36,1]]]
>
> Also, sorry about sending the email twice.  My email client messed up.
>
> Thanks,
> Bryan
>
> I've tried a larger graph.  It ran for 2 hours then failed with different
> error messages I believe.
>
> Bryan
>