You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Yingyi Bu <bu...@gmail.com> on 2011/11/18 06:38:40 UTC

PageRank OOM Exception

Hi,

    I'm running a Giraph PageRank job.  I tried with 8GB input text data
over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
is 800MB input-data/machine.    However,  Giraph job fails because of high
GC costs and Out-of-Memory exception.
    Do I set some special things in Hadoop configurations, for example,
 maximum heap size for map task vm ?
    Thanks!!

Best regards,
Yingyi

Re: PageRank OOM Exception

Posted by Claudio Martella <cl...@gmail.com>.
Thanks, we'll fix that.

Meanwhile use this patch to get trunk to build.

On Fri, Nov 18, 2011 at 9:28 AM, Yingyi Bu <bu...@gmail.com> wrote:
> Could anyone fix the trunk:  two files miss headers so that build fails...
>
> Attached is the target/rat.txt from the failed build.
> I fixed them locally anyway...
> Thanks!
> Yingyi
> On Thu, Nov 17, 2011 at 11:53 PM, Yingyi Bu <bu...@gmail.com> wrote:
>>
>> Avery,
>>      Thanks a lot for help!!
>>      I'll sync the trunk and try with your suggested settings.
>> Best regards,
>> Yingyi
>> On Thu, Nov 17, 2011 at 11:47 PM, Avery Ching <ac...@apache.org> wrote:
>>>
>>> Yingyi,
>>>
>>> Looks like you lost the connection to ZooKeeper.  You might want to sync
>>> with trunk.  GIRAPH-11 changed the settings to allow longer ZooKeeper
>>> timeouts.  Also, ordering of the vertices is no longer required and the load
>>> balancing should be better.  Looks like you might want to try to add some
>>> better GC options to reduce stop-the-world pauses (likely causing the
>>> timeouts).
>>>
>>> Here's some example settings you can trying fiddling with as well just
>>> add them to the other JVM settings you tried out earlier.  Let us know how
>>> its goes.
>>>
>>>  -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelGCThreads=8
>>> -XX:+CMSIncrementalPacing -XX:+PrintGCDetails
>>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:+PrintTenuringDistribution
>>>
>>> Avery
>>>
>>> On 11/17/11 11:24 PM, Yingyi Bu wrote:
>>>
>>> Hi Avery,
>>>     Thanks a lot for your help!!
>>>     I use your settings, and get rid of OOM now!   However, after running
>>> the job for 10 minutes, one worker failed, and then for a while, all mappers
>>> failed.  Attached below are mapper logs from two nodes.  It seems they
>>> cannot connect to the Zookeeper.  The workers run well until the highlighted
>>> exception.  Do I miss something in the job setting?
>>>     Thanks, again!!
>>> Best regards,
>>> Yingyi
>>>
>>>
>>> Mapper log on Node-1:
>>>  2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 0 '
>>> (polling period is 3000)
>>> 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename
>>> 'zkServerList_asterix-010 0 '
>>> 2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Trying to delete old directory
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> generateZooKeeperConfigFile: Creating file
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg
>>> in
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>>> with base port 22181
>>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> generateZooKeeperConfigFile: Delete of zoo.cfg = false
>>> 2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Attempting to start ZooKeeper server with command
>>> [/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m,
>>> -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
>>> -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar,
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain,
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg]
>>> in directory
>>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>>> 2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
>>> asterix-010:22181 with poll msecs = 3000
>>> 2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Got ConnectException
>>> java.net.ConnectException: Connection refused
>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>         at
>>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>         at java.net.Socket.connect(Socket.java:529)
>>>         at
>>> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
>>>         at
>>> org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at
>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
>>> asterix-010:22181 with poll msecs = 3000
>>> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Connected!
>>> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Creating my filestamp
>>> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
>>> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
>>> Starting up BspServiceMaster (master thread)...
>>> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
>>> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
>>> asterix-010:22181
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=asterix-010
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_21
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
>>> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
>>> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
>>> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
>>> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
>>> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
>>> asterix-010:22181 with poll msecs = 3000
>>> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Connected!
>>> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
>>> onlineZooKeeperServers: Creating my filestamp
>>> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
>>> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
>>> Starting up BspServiceMaster (master thread)...
>>> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
>>> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
>>> asterix-010:22181
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=asterix-010
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_21
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/
>>> space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hado
>>> op-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205
>>> .0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/
>>> ../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.
>>> 20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-17
>>> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17
>>> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
>>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.name=Linux
>>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.arch=amd642011-11-17 22:56:42,087 INFO
>>> org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO
>>> org.apache.zookeeper.ZooKeeper: Client environment:user.name=yingyib
>>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/yingyib
>>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
>>> 2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=asterix-010:22181 sessionTimeout=60000
>>> watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
>>> 2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server asterix-010/10.0.0.10:22181
>>> 2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to asterix-010/10.0.0.10:22181, initiating session
>>> 2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
>>> 0x133b57675b60000, negotiated timeout = 60000
>>> 2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: process:
>>> Asynchronous connection complete.
>>> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: No
>>> need to do anything when not a worker
>>> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper:
>>> cleanup: Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 INFO
>>> org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is
>>> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
>>> and my bid is
>>> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
>>> 2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster:
>>> becomeMaster: I am now the master!
>>> 2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: process:
>>> applicationAttemptChanged signaled
>>> 2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: process:
>>> Unknown and unprocessed event
>>> (path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir,
>>> type=NodeChildrenChanged, state=SyncConnected)
>>> 2011-11-17 22:56:45,130 INFO
>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
>>> process : 10
>>> 2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster:
>>> coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
>>> 2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error
>>> while calling watcher
>>> java.lang.RuntimeException:
>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>>> NoNode for
>>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>>>         at
>>> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
>>>         at
>>> org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
>>>         at
>>> org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
>>>         at
>>> org.apache.giraph.graph.BspService.process(BspService.java:1095)
>>>         at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>> KeeperErrorCode = NoNode for
>>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>>>         at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>         at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
>>>         at
>>> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
>>>         ... 4 more2011-11-17 23:01:22,009 INFO
>>> org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out of 10
>>> chosen workers finished on superstep -12011-11-17 23:11:27,357 WARN
>>> org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a
>>> shutdown hook kill of the ZooKeeper process.
>>>
>>> Mapper log on Node-2:
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=asterix-001
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_21
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
>>> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
>>> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
>>> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
>>> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
>>> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.name=Linux
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.arch=amd64
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.version=2.6.18-194.26.1.el5
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.name=yingyib
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/yingyib
>>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
>>> 2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=asterix-010:22181 sessionTimeout=60000
>>> watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
>>> 2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server asterix-010/10.0.0.10:22181
>>> 2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to asterix-010/10.0.0.10:22181, initiating session
>>> 2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
>>> 0x133b57675b60007, negotiated timeout = 60000
>>> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: process:
>>> Asynchronous connection complete.
>>> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: setup:
>>> Registering health of this worker...
>>> 2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService:
>>> getJobState: Job state already exists
>>> (/_hadoopBsp/job_201111172247_0003/_masterJobState)
>>> 2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService:
>>> getApplicationAttempt: Node
>>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
>>> 2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService:
>>> getApplicationAttempt: Node
>>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
>>> 2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker:
>>> registerHealth: Created my health node for attempt=0, superstep=-1 with
>>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8
>>> and hostnamePort = ["asterix-001",30008]
>>> 2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: process:
>>> inputSplitsReadyChanged (input splits ready)
>>> 2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: process:
>>> Unknown and unprocessed event
>>> (path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved,
>>> type=NodeCreated, state=SyncConnected)
>>> 2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker:
>>> reserveInputSplit: Reserved input split path
>>> /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
>>> 2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker:
>>> loadVertices: Reserved /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
>>> from ZooKeeper and got input split
>>> 'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
>>> 2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client
>>> session timed out, have not heard from server in 59117ms for sessionid
>>> 0x133b57675b60007, closing socket connection and attempting reconnect
>>> 2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error
>>> while calling watcher
>>> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot
>>> recover.
>>>         at
>>> org.apache.giraph.graph.BspService.process(BspService.java:990)
>>>         at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
>>> 2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server asterix-010/10.0.0.10:22181
>>> 2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to asterix-010/10.0.0.10:22181, initiating session
>>> 2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable to
>>> reconnect to ZooKeeper service, session 0x133b57675b60007 has expired,
>>> closing socket connection
>>> On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <ac...@apache.org> wrote:
>>>>
>>>> Hi Yingyi,
>>>>
>>>> Here are some ideas you might want to try:
>>>>
>>>> 1)  Limit the thread stack size.
>>>>
>>>> 2  You can set the heap available to the mapper jvm.
>>>>
>>>> I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k)
>>>> for the threads.
>>>>
>>>> -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>>>>
>>>> Also, you might want to try using the EdgeListVertex instead of Vertex
>>>> (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit
>>>> smaller.
>>>>
>>>> Let us know if that helps you.  You should also check to see if your
>>>> Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you
>>>> will be limited in how much heap you can use.
>>>>
>>>> Avery
>>>>
>>>> On 11/17/11 9:38 PM, Yingyi Bu wrote:
>>>>
>>>> Hi,
>>>>     I'm running a Giraph PageRank job.  I tried with 8GB input text data
>>>> over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
>>>> is 800MB input-data/machine.    However,  Giraph job fails because of high
>>>> GC costs and Out-of-Memory exception.
>>>>     Do I set some special things in Hadoop configurations, for example,
>>>>  maximum heap size for map task vm ?
>>>>     Thanks!!
>>>> Best regards,
>>>> Yingyi
>>>
>>>
>>
>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: PageRank OOM Exception

Posted by Yingyi Bu <bu...@gmail.com>.
Could anyone fix the trunk:  two files miss headers so that build fails...

Attached is the target/rat.txt from the failed build.
I fixed them locally anyway...

Thanks!
Yingyi

On Thu, Nov 17, 2011 at 11:53 PM, Yingyi Bu <bu...@gmail.com> wrote:

> Avery,
>
>      Thanks a lot for help!!
>      I'll sync the trunk and try with your suggested settings.
>
> Best regards,
> Yingyi
>
> On Thu, Nov 17, 2011 at 11:47 PM, Avery Ching <ac...@apache.org> wrote:
>
>>  Yingyi,
>>
>> Looks like you lost the connection to ZooKeeper.  You might want to sync
>> with trunk.  GIRAPH-11 changed the settings to allow longer ZooKeeper
>> timeouts.  Also, ordering of the vertices is no longer required and the
>> load balancing should be better.  Looks like you might want to try to add
>> some better GC options to reduce stop-the-world pauses (likely causing the
>> timeouts).
>>
>> Here's some example settings you can trying fiddling with as well just
>> add them to the other JVM settings you tried out earlier.  Let us know how
>> its goes.
>>
>>  -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelGCThreads=8
>> -XX:+CMSIncrementalPacing -XX:+PrintGCDetails
>> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+PrintTenuringDistribution
>>
>> Avery
>>
>>
>> On 11/17/11 11:24 PM, Yingyi Bu wrote:
>>
>> Hi Avery,
>>
>>      Thanks a lot for your help!!
>>     I use your settings, and get rid of OOM now!   However, after running
>> the job for 10 minutes, one worker failed, and then for a while, all
>> mappers failed.  Attached below are mapper logs from two nodes.  It seems
>> they cannot connect to the Zookeeper.  The workers run well until the
>> highlighted exception.  Do I miss something in the job setting?
>>     Thanks, again!!
>>
>>  Best regards,
>> Yingyi
>>
>>
>>
>>  Mapper log on Node-1:
>>  2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
>> getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 0 '
>> (polling period is 3000)
>> 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
>> getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename
>> 'zkServerList_asterix-010 0 '
>> 2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Trying to delete old directory
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>> generateZooKeeperConfigFile: Creating file
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg
>> in
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>> with base port 22181
>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>> generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
>> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
>> generateZooKeeperConfigFile: Delete of zoo.cfg = false
>> 2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Attempting to start ZooKeeper server with command
>> [/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m,
>> -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
>> -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar,
>> org.apache.zookeeper.server.quorum.QuorumPeerMain,
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg]
>> in directory
>> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
>> 2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
>> asterix-010:22181 with poll msecs = 3000
>> 2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Got ConnectException
>> java.net.ConnectException: Connection refused
>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>         at
>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>         at java.net.Socket.connect(Socket.java:529)
>>         at
>> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
>>         at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
>> asterix-010:22181 with poll msecs = 3000
>> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Connected!
>> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Creating my filestamp
>> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
>> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
>> Starting up BspServiceMaster (master thread)...
>> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
>> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
>> asterix-010:22181
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:host.name=asterix-010
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_21
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.vendor=Sun Microsystems Inc.
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
>> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
>> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
>> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
>> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
>> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>>          at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
>> asterix-010:22181 with poll msecs = 3000
>> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Connected!
>> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
>> onlineZooKeeperServers: Creating my filestamp
>> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
>> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
>> Starting up BspServiceMaster (master thread)...
>> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
>> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
>> asterix-010:22181
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:host.name=asterix-010
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_21
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.vendor=Sun Microsystems Inc.
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/
>> space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hado
>> op-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205
>> .0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/
>> ../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.
>> 20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-1722:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17
>> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.arch=amd642011-11-17 22:56:42,087 INFO
>> org.apache.zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO
>> org.apache.zookeeper.ZooKeeper: Client environment:user.name=yingyib
>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.home=/home/yingyib
>> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
>> 2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=asterix-010:22181 sessionTimeout=60000
>> watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
>> 2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server asterix-010/10.0.0.10:22181
>> 2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to asterix-010/10.0.0.10:22181, initiating session
>> 2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server asterix-010/10.0.0.10:22181, sessionid
>> = 0x133b57675b60000, negotiated timeout = 60000
>> 2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: process:
>> Asynchronous connection complete.
>> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: No
>> need to do anything when not a worker
>> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper:
>> cleanup: Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 INFO
>> org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is
>> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
>> and my bid is
>> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
>> 2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster:
>> becomeMaster: I am now the master!
>> 2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: process:
>> applicationAttemptChanged signaled
>> 2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: process:
>> Unknown and unprocessed event
>> (path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir,
>> type=NodeChildrenChanged, state=SyncConnected)
>> 2011-11-17 22:56:45,130 INFO
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
>> process : 10
>> 2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster:
>> coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
>> 2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error
>> while calling watcher
>> java.lang.RuntimeException:
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for
>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>>         at
>> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
>>         at
>> org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
>>         at
>> org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
>>         at
>> org.apache.giraph.graph.BspService.process(BspService.java:1095)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for
>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
>>         at
>> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
>>         ... 4 more2011-11-17 23:01:22,009 INFO
>> org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out of 10
>> chosen workers finished on superstep -12011-11-17 23:11:27,357 WARN
>> org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a
>> shutdown hook kill of the ZooKeeper process.
>>
>>
>>  Mapper log on Node-2:
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:host.name=asterix-001
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.version=1.6.0_21
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.vendor=Sun Microsystems Inc.
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
>> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
>> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
>> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
>> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
>> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>>  2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:java.compiler=<NA>
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.name=Linux
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.arch=amd64
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:os.version=2.6.18-194.26.1.el5
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.name=yingyib
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.home=/home/yingyib
>> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
>> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
>> 2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=asterix-010:22181 sessionTimeout=60000
>> watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
>> 2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server asterix-010/10.0.0.10:22181
>> 2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to asterix-010/10.0.0.10:22181, initiating session
>> 2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server asterix-010/10.0.0.10:22181, sessionid
>> = 0x133b57675b60007, negotiated timeout = 60000
>> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: process:
>> Asynchronous connection complete.
>> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: setup:
>> Registering health of this worker...
>> 2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService:
>> getJobState: Job state already exists
>> (/_hadoopBsp/job_201111172247_0003/_masterJobState)
>> 2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService:
>> getApplicationAttempt: Node
>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
>> 2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService:
>> getApplicationAttempt: Node
>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
>> 2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker:
>> registerHealth: Created my health node for attempt=0, superstep=-1 with
>> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8
>> and hostnamePort = ["asterix-001",30008]
>> 2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: process:
>> inputSplitsReadyChanged (input splits ready)
>> 2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: process:
>> Unknown and unprocessed event
>> (path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved,
>> type=NodeCreated, state=SyncConnected)
>> 2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker:
>> reserveInputSplit: Reserved input split path
>> /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
>> 2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker:
>> loadVertices: Reserved /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
>> from ZooKeeper and got input split
>> 'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
>> 2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client
>> session timed out, have not heard from server in 59117ms for sessionid
>> 0x133b57675b60007, closing socket connection and attempting reconnect
>>  2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error
>> while calling watcher
>> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot
>> recover.
>>         at org.apache.giraph.graph.BspService.process(BspService.java:990)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
>> 2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server asterix-010/10.0.0.10:22181
>> 2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to asterix-010/10.0.0.10:22181, initiating session
>> 2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable to
>> reconnect to ZooKeeper service, session 0x133b57675b60007 has expired,
>> closing socket connection
>>
>>  On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <ac...@apache.org> wrote:
>>
>>>  Hi Yingyi,
>>>
>>> Here are some ideas you might want to try:
>>>
>>> 1)  Limit the thread stack size.
>>>
>>> 2  You can set the heap available to the mapper jvm.
>>>
>>> I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k)
>>> for the threads.
>>>
>>> -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>>>
>>> Also, you might want to try using the EdgeListVertex instead of Vertex
>>> (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit
>>> smaller.
>>>
>>> Let us know if that helps you.  You should also check to see if your
>>> Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you
>>> will be limited in how much heap you can use.
>>>
>>> Avery
>>>
>>>
>>> On 11/17/11 9:38 PM, Yingyi Bu wrote:
>>>
>>> Hi,
>>>
>>>     I'm running a Giraph PageRank job.  I tried with 8GB input text data
>>> over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
>>> is 800MB input-data/machine.    However,  Giraph job fails because of high
>>> GC costs and Out-of-Memory exception.
>>>      Do I set some special things in Hadoop configurations, for
>>> example,  maximum heap size for map task vm ?
>>>     Thanks!!
>>>
>>>  Best regards,
>>> Yingyi
>>>
>>>
>>>
>>
>>
>

Re: PageRank OOM Exception

Posted by Yingyi Bu <bu...@gmail.com>.
Avery,

     Thanks a lot for help!!
     I'll sync the trunk and try with your suggested settings.

Best regards,
Yingyi

On Thu, Nov 17, 2011 at 11:47 PM, Avery Ching <ac...@apache.org> wrote:

>  Yingyi,
>
> Looks like you lost the connection to ZooKeeper.  You might want to sync
> with trunk.  GIRAPH-11 changed the settings to allow longer ZooKeeper
> timeouts.  Also, ordering of the vertices is no longer required and the
> load balancing should be better.  Looks like you might want to try to add
> some better GC options to reduce stop-the-world pauses (likely causing the
> timeouts).
>
> Here's some example settings you can trying fiddling with as well just add
> them to the other JVM settings you tried out earlier.  Let us know how its
> goes.
>
>  -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelGCThreads=8
> -XX:+CMSIncrementalPacing -XX:+PrintGCDetails
> -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+PrintTenuringDistribution
>
> Avery
>
>
> On 11/17/11 11:24 PM, Yingyi Bu wrote:
>
> Hi Avery,
>
>      Thanks a lot for your help!!
>     I use your settings, and get rid of OOM now!   However, after running
> the job for 10 minutes, one worker failed, and then for a while, all
> mappers failed.  Attached below are mapper logs from two nodes.  It seems
> they cannot connect to the Zookeeper.  The workers run well until the
> highlighted exception.  Do I miss something in the job setting?
>     Thanks, again!!
>
>  Best regards,
> Yingyi
>
>
>
>  Mapper log on Node-1:
>  2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
> getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 0 '
> (polling period is 3000)
> 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
> getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename
> 'zkServerList_asterix-010 0 '
> 2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Trying to delete old directory
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Creating file
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg
> in
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
> with base port 22181
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
> generateZooKeeperConfigFile: Delete of zoo.cfg = false
> 2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Attempting to start ZooKeeper server with command
> [/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m,
> -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
> -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar,
> org.apache.zookeeper.server.quorum.QuorumPeerMain,
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg]
> in directory
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
> 2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
> asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:529)
>         at
> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
>         at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
> asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connected!
> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Creating my filestamp
> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
> Starting up BspServiceMaster (master thread)...
> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
> asterix-010:22181
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=asterix-010
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
> asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Connected!
> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
> onlineZooKeeperServers: Creating my filestamp
> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
> Starting up BspServiceMaster (master thread)...
> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
> asterix-010:22181
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=asterix-010
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/
> space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hado
> op-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205
> .0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/
> ../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.
> 20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-1722:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17
> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd642011-11-17 22:56:42,087 INFO
> org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO
> org.apache.zookeeper.ZooKeeper: Client environment:user.name=yingyib
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=/home/yingyib
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
> 2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=asterix-010:22181 sessionTimeout=60000
> watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
> 2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server asterix-010/10.0.0.10:22181
> 2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to asterix-010/10.0.0.10:22181, initiating session
> 2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
> 0x133b57675b60000, negotiated timeout = 60000
> 2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: process:
> Asynchronous connection complete.
> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: No
> need to do anything when not a worker
> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: cleanup:
> Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 INFO
> org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is
> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
> and my bid is
> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
> 2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster:
> becomeMaster: I am now the master!
> 2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: process:
> applicationAttemptChanged signaled
> 2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: process:
> Unknown and unprocessed event
> (path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir,
> type=NodeChildrenChanged, state=SyncConnected)
> 2011-11-17 22:56:45,130 INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
> process : 10
> 2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster:
> coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
> 2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error while
> calling watcher
> java.lang.RuntimeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>         at
> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
>         at
> org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
>         at
> org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
>         at org.apache.giraph.graph.BspService.process(BspService.java:1095)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
>         at
> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
>         ... 4 more2011-11-17 23:01:22,009 INFO
> org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out of 10
> chosen workers finished on superstep -12011-11-17 23:11:27,357 WARN
> org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a
> shutdown hook kill of the ZooKeeper process.
>
>
>  Mapper log on Node-2:
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=asterix-001
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-
> 0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexe
> c/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/
> hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/
> jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../
> share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-194.26.1.el5
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=yingyib
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=/home/yingyib
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
> 2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=asterix-010:22181 sessionTimeout=60000
> watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
> 2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server asterix-010/10.0.0.10:22181
> 2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to asterix-010/10.0.0.10:22181, initiating session
> 2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
> 0x133b57675b60007, negotiated timeout = 60000
> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: process:
> Asynchronous connection complete.
> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: setup:
> Registering health of this worker...
> 2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService:
> getJobState: Job state already exists
> (/_hadoopBsp/job_201111172247_0003/_masterJobState)
> 2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService:
> getApplicationAttempt: Node
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
> 2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService:
> getApplicationAttempt: Node
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
> 2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker:
> registerHealth: Created my health node for attempt=0, superstep=-1 with
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8
> and hostnamePort = ["asterix-001",30008]
> 2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: process:
> inputSplitsReadyChanged (input splits ready)
> 2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: process:
> Unknown and unprocessed event
> (path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved,
> type=NodeCreated, state=SyncConnected)
> 2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker:
> reserveInputSplit: Reserved input split path
> /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
> 2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker:
> loadVertices: Reserved /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
> from ZooKeeper and got input split
> 'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
> 2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 59117ms for sessionid
> 0x133b57675b60007, closing socket connection and attempting reconnect
>  2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error
> while calling watcher
> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot
> recover.
>         at org.apache.giraph.graph.BspService.process(BspService.java:990)
>         at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> 2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server asterix-010/10.0.0.10:22181
> 2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to asterix-010/10.0.0.10:22181, initiating session
> 2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x133b57675b60007 has expired,
> closing socket connection
>
>  On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <ac...@apache.org> wrote:
>
>>  Hi Yingyi,
>>
>> Here are some ideas you might want to try:
>>
>> 1)  Limit the thread stack size.
>>
>> 2  You can set the heap available to the mapper jvm.
>>
>> I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k)
>> for the threads.
>>
>> -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>>
>> Also, you might want to try using the EdgeListVertex instead of Vertex
>> (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit
>> smaller.
>>
>> Let us know if that helps you.  You should also check to see if your
>> Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you
>> will be limited in how much heap you can use.
>>
>> Avery
>>
>>
>> On 11/17/11 9:38 PM, Yingyi Bu wrote:
>>
>> Hi,
>>
>>     I'm running a Giraph PageRank job.  I tried with 8GB input text data
>> over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
>> is 800MB input-data/machine.    However,  Giraph job fails because of high
>> GC costs and Out-of-Memory exception.
>>      Do I set some special things in Hadoop configurations, for example,
>>  maximum heap size for map task vm ?
>>     Thanks!!
>>
>>  Best regards,
>> Yingyi
>>
>>
>>
>
>

Re: PageRank OOM Exception

Posted by Avery Ching <ac...@apache.org>.
Yingyi,

Looks like you lost the connection to ZooKeeper.  You might want to sync 
with trunk.  GIRAPH-11 changed the settings to allow longer ZooKeeper 
timeouts.  Also, ordering of the vertices is no longer required and the 
load balancing should be better.  Looks like you might want to try to 
add some better GC options to reduce stop-the-world pauses (likely 
causing the timeouts).

Here's some example settings you can trying fiddling with as well just 
add them to the other JVM settings you tried out earlier.  Let us know 
how its goes.

  -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
-XX:ParallelGCThreads=8 -XX:+CMSIncrementalPacing -XX:+PrintGCDetails 
-XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+PrintTenuringDistribution

Avery

On 11/17/11 11:24 PM, Yingyi Bu wrote:
> Hi Avery,
>
>     Thanks a lot for your help!!
>     I use your settings, and get rid of OOM now!   However, after 
> running the job for 10 minutes, one worker failed, and then for a 
> while, all mappers failed.  Attached below are mapper logs from two 
> nodes.  It seems they cannot connect to the Zookeeper.  The workers 
> run well until the highlighted exception.  Do I miss something in the 
> job setting?
>     Thanks, again!!
>
> Best regards,
> Yingyi
>
>
> Mapper log on Node-1:
>  2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager: 
> getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 
> 0 ' (polling period is 3000)
> 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager: 
> getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename 
> 'zkServerList_asterix-010 0 '
> 2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Trying to delete old directory 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager: 
> generateZooKeeperConfigFile: Creating file 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg 
> in 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper 
> with base port 22181
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager: 
> generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
> 2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager: 
> generateZooKeeperConfigFile: Delete of zoo.cfg = false
> 2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Attempting to start ZooKeeper server with 
> command [/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m, 
> -XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC, 
> -XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp, 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar, 
> org.apache.zookeeper.server.quorum.QuorumPeerMain, 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg] 
> in directory 
> /mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
> 2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect 
> to asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Got ConnectException
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at 
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>         at java.net.Socket.connect(Socket.java:529)
>         at 
> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
>         at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect 
> to asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connected!
> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Creating my filestamp 
> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: 
> setup: Starting up BspServiceMaster (master thread)...
> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService: 
> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 
> on asterix-010:22181
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:host.name <http://host.name>=asterix-010
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect 
> to asterix-010:22181 with poll msecs = 3000
> 2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Connected!
> 2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager: 
> onlineZooKeeperServers: Creating my filestamp 
> _bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
> 2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: 
> setup: Starting up BspServiceMaster (master thread)...
> 2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService: 
> BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 
> on asterix-010:22181
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:host.name <http://host.name>=asterix-010
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-17 
> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17 
> 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.compiler=<NA>
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:os.name <http://os.name>=Linux
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:os.arch=amd642011-11-17 22:56:42,087 INFO 
> org.apache.zookeeper.ZooKeeper: Client 
> environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO 
> org.apache.zookeeper.ZooKeeper: Client environment:user.name 
> <http://user.name>=yingyib
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.home=/home/yingyib
> 2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
> 2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=asterix-010:22181 
> sessionTimeout=60000 
> watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
> 2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>
> 2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>, initiating session
> 2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>, sessionid = 0x133b57675b60000, negotiated 
> timeout = 60000
> 2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: 
> process: Asynchronous connection complete.
> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: 
> No need to do anything when not a worker
> 2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: 
> cleanup: Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 
> INFO org.apache.giraph.graph.BspServiceMaster: becomeMaster: First 
> child is 
> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000' 
> and my bid is 
> '/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
> 2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster: 
> becomeMaster: I am now the master!
> 2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: 
> process: applicationAttemptChanged signaled
> 2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: 
> process: Unknown and unprocessed event 
> (path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir, 
> type=NodeChildrenChanged, state=SyncConnected)
> 2011-11-17 22:56:45,130 INFO 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input 
> paths to process : 10
> 2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster: 
> coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
> 2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error 
> while calling watcher
> java.lang.RuntimeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode 
> = NoNode for 
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>         at 
> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
>         at 
> org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
>         at 
> org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
>         at 
> org.apache.giraph.graph.BspService.process(BspService.java:1095)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
>         at 
> org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
>         ... 4 more2011-11-17 23:01:22,009 INFO 
> org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out 
> of 10 chosen workers finished on superstep -12011-11-17 23:11:27,357 
> WARN org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: 
> Forced a shutdown hook kill of the ZooKeeper process.
>
>
> Mapper log on Node-2:
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:host.name <http://host.name>=asterix-001
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.version=1.6.0_21
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.vendor=Sun Microsystems Inc.
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:java.compiler=<NA>
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:os.name <http://os.name>=Linux
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:os.arch=amd64
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:os.version=2.6.18-194.26.1.el5
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.name <http://user.name>=yingyib
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.home=/home/yingyib
> 2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
> 2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=asterix-010:22181 
> sessionTimeout=60000 
> watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
> 2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>
> 2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>, initiating session
> 2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>, sessionid = 0x133b57675b60007, negotiated 
> timeout = 60000
> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: 
> process: Asynchronous connection complete.
> 2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: 
> setup: Registering health of this worker...
> 2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService: 
> getJobState: Job state already exists 
> (/_hadoopBsp/job_201111172247_0003/_masterJobState)
> 2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService: 
> getApplicationAttempt: Node 
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
> 2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService: 
> getApplicationAttempt: Node 
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
> 2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker: 
> registerHealth: Created my health node for attempt=0, superstep=-1 
> with 
> /_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8 
> and hostnamePort = ["asterix-001",30008]
> 2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: 
> process: inputSplitsReadyChanged (input splits ready)
> 2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: 
> process: Unknown and unprocessed event 
> (path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved, 
> type=NodeCreated, state=SyncConnected)
> 2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker: 
> reserveInputSplit: Reserved input split path 
> /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
> 2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker: 
> loadVertices: Reserved 
> /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2 from ZooKeeper and 
> got input split 
> 'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
> 2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client 
> session timed out, have not heard from server in 59117ms for sessionid 
> 0x133b57675b60007, closing socket connection and attempting reconnect
> 2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error 
> while calling watcher
> java.lang.RuntimeException: process: Disconnected from ZooKeeper, 
> cannot recover.
>         at org.apache.giraph.graph.BspService.process(BspService.java:990)
>         at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> 2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>
> 2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to asterix-010/10.0.0.10:22181 
> <http://10.0.0.10:22181>, initiating session
> 2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable 
> to reconnect to ZooKeeper service, session 0x133b57675b60007 has 
> expired, closing socket connection
>
> On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <aching@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi Yingyi,
>
>     Here are some ideas you might want to try:
>
>     1)  Limit the thread stack size.
>
>     2  You can set the heap available to the mapper jvm.
>
>     I.e. Here's a setting to get 10 GB of heap and use a smaller stack
>     (64k) for the threads.
>
>     -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>
>     Also, you might want to try using the EdgeListVertex instead of
>     Vertex (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it
>     is quite a bit smaller.
>
>     Let us know if that helps you.  You should also check to see if
>     your Hadoop installation is using a 32-bit of 64-bit JVM.  If it's
>     32-bit you will be limited in how much heap you can use.
>
>     Avery
>
>
>     On 11/17/11 9:38 PM, Yingyi Bu wrote:
>>     Hi,
>>
>>         I'm running a Giraph PageRank job.  I tried with 8GB input
>>     text data over 10 nodes (each has 4 core,  4 disks,  and 12GB
>>     physical memory),  that is 800MB input-data/machine.    However,
>>      Giraph job fails because of high GC costs and Out-of-Memory
>>     exception.
>>         Do I set some special things in Hadoop configurations, for
>>     example,  maximum heap size for map task vm ?
>>         Thanks!!
>>
>>     Best regards,
>>     Yingyi
>
>


Re: PageRank OOM Exception

Posted by Yingyi Bu <bu...@gmail.com>.
Hi Avery,

    Thanks a lot for your help!!
    I use your settings, and get rid of OOM now!   However, after running
the job for 10 minutes, one worker failed, and then for a while, all
mappers failed.  Attached below are mapper logs from two nodes.  It seems
they cannot connect to the Zookeeper.  The workers run well until the
highlighted exception.  Do I miss something in the job setting?
    Thanks, again!!

Best regards,
Yingyi



Mapper log on Node-1:
 2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: For task 0, got file 'zkServerList_asterix-010 0 '
(polling period is 3000)
2011-11-17 22:56:39,044 INFO org.apache.giraph.zk.ZooKeeperManager:
getZooKeeperServerList: Found [asterix-010, 0] 2 hosts in filename
'zkServerList_asterix-010 0 '
2011-11-17 22:56:39,046 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Trying to delete old directory
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Creating file
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg
in
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
with base port 22181
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Make directory of _bspZooKeeper = true
2011-11-17 22:56:39,049 INFO org.apache.giraph.zk.ZooKeeperManager:
generateZooKeeperConfigFile: Delete of zoo.cfg = false
2011-11-17 22:56:39,050 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Attempting to start ZooKeeper server with command
[/mnt/data/sda/space/yingyi/tools/java/jre/bin/java, -Xmx256m,
-XX:ParallelGCThreads=4, -XX:+UseConcMarkSweepGC,
-XX:CMSInitiatingOccupancyFraction=70, -XX:MaxGCPauseMillis=100, -cp,
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/job.jar,
org.apache.zookeeper.server.quorum.QuorumPeerMain,
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper/zoo.cfg]
in directory
/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/work/_bspZooKeeper
2011-11-17 22:56:39,056 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:39,058 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:529)
        at
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:612)
        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:401)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connected!
2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
Starting up BspServiceMaster (master thread)...
2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
asterix-010:22181
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-010
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
2011-11-17 22:56:42,062 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connect attempt 1 of 10 max trying to connect to
asterix-010:22181 with poll msecs = 3000
2011-11-17 22:56:42,063 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Connected!
2011-11-17 22:56:42,064 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Creating my filestamp
_bsp/_defaultZkManagerDir/job_201111172247_0003/_zkServer/asterix-010 0
2011-11-17 22:56:42,070 INFO org.apache.giraph.graph.GraphMapper: setup:
Starting up BspServiceMaster (master thread)...
2011-11-17 22:56:42,080 INFO org.apache.giraph.graph.BspService:
BspService: Connecting to ZooKeeper with job job_201111172247_0003, 0 on
asterix-010:22181
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-010
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:42,086 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar2011-11-17
22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work2011-11-17
22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work/tmp
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd642011-11-17 22:56:42,087 INFO
org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.26.1.el52011-11-17 22:56:42,087 INFO
org.apache.zookeeper.ZooKeeper: Client environment:user.name=yingyib
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/yingyib
2011-11-17 22:56:42,087 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000000_0/work
2011-11-17 22:56:42,088 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=asterix-010:22181 sessionTimeout=60000
watcher=org.apache.giraph.graph.BspServiceMaster@13a78071
2011-11-17 22:56:42,098 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 22:56:42,099 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 22:56:42,123 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
0x133b57675b60000, negotiated timeout = 60000
2011-11-17 22:56:42,125 INFO org.apache.giraph.graph.BspService: process:
Asynchronous connection complete.
2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: map: No
need to do anything when not a worker
2011-11-17 22:56:42,126 INFO org.apache.giraph.graph.GraphMapper: cleanup:
Starting for MASTER_ZOOKEEPER_ONLY2011-11-17 22:56:42,197 INFO
org.apache.giraph.graph.BspServiceMaster: becomeMaster: First child is
'/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
and my bid is
'/_hadoopBsp/job_201111172247_0003/_masterElectionDir/asterix-010_00000000000'
2011-11-17 22:56:42,197 INFO org.apache.giraph.graph.BspServiceMaster:
becomeMaster: I am now the master!
2011-11-17 22:56:42,208 INFO org.apache.giraph.graph.BspService: process:
applicationAttemptChanged signaled
2011-11-17 22:56:42,216 WARN org.apache.giraph.graph.BspService: process:
Unknown and unprocessed event
(path=/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir,
type=NodeChildrenChanged, state=SyncConnected)
2011-11-17 22:56:45,130 INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to
process : 10
2011-11-17 22:56:45,227 INFO org.apache.giraph.graph.BspServiceMaster:
coordinateSuperstep: 0 out of 10 chosen workers finished on superstep -1
2011-11-17 23:01:20,045 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.lang.RuntimeException:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
        at
org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:885)
        at
org.apache.giraph.graph.BspServiceMaster.checkHealthyWorkerFailure(BspServiceMaster.java:1946)
        at
org.apache.giraph.graph.BspServiceMaster.processEvent(BspServiceMaster.java:1976)
        at org.apache.giraph.graph.BspService.process(BspService.java:1095)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_vertexRangeAssignments
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
        at
org.apache.giraph.graph.BspService.getVertexRangeMap(BspService.java:858)
        ... 4 more2011-11-17 23:01:22,009 INFO
org.apache.giraph.graph.BspServiceMaster: coordinateSuperstep: 0 out of 10
chosen workers finished on superstep -12011-11-17 23:11:27,357 WARN
org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Forced a
shutdown hook kill of the ZooKeeper process.


Mapper log on Node-2:
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=asterix-001
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/mnt/data/sda/space/yingyi/tools/java/jre
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars/classes:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/jars:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../conf:/mnt/data/sda/space/yingyi/tools/java/lib/tools.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/test/classes:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../build/tools:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/hadoop-core-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/asm-3.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjrt-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/aspectjtools-1.6.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-1.7.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-beanutils-core-1.8.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-cli-1.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-codec-1.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-collections-3.2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-configuration-1.6.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-daemon-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-digester-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-el-1.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-httpclient-3.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-lang-2.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-1.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-logging-api-1.0.4.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-math-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/commons-net-1.4.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/core-3.1.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-capacity-scheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-fairscheduler-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hadoop-thriftfs-0.20.205.0.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/hsqldb-1.8.0.10.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-core-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jackson-mapper-asl-1.0.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-compiler-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jasper-runtime-5.5.12.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jdeb-0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-core-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-json-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jersey-server-1.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jets3t-0.6.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jetty-util-6.1.26.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsch-0.1.42.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/junit-4.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/kfs-0.2.2.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/log4j-1.2.15.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/mockito-all-1.8.5.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/oro-2.0.8.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/servlet-api-2.5-20081211.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-api-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/slf4j-log4j12-1.4.3.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/xmlenc-0.52.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-2.1.jar:/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../share/hadoop/lib/jsp-2.1/jsp-api-2.1.jar
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=/mnt/data/sda/space/yingyi/hadoop-0.20.205.0/libexec/../lib:/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work/tmp
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.26.1.el5
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=yingyib
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=/home/yingyib
2011-11-17 22:56:44,158 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=/mnt/data/sda/space/yingyi/hdfsdata_giraph/mapred/local/taskTracker/yingyib/jobcache/job_201111172247_0003/attempt_201111172247_0003_m_000008_0/work
2011-11-17 22:56:44,159 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=asterix-010:22181 sessionTimeout=60000
watcher=org.apache.giraph.graph.BspServiceWorker@60ded0f0
2011-11-17 22:56:44,171 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 22:56:44,173 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 22:56:44,178 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server asterix-010/10.0.0.10:22181, sessionid =
0x133b57675b60007, negotiated timeout = 60000
2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.BspService: process:
Asynchronous connection complete.
2011-11-17 22:56:44,180 INFO org.apache.giraph.graph.GraphMapper: setup:
Registering health of this worker...
2011-11-17 22:56:44,191 INFO org.apache.giraph.graph.BspService:
getJobState: Job state already exists
(/_hadoopBsp/job_201111172247_0003/_masterJobState)
2011-11-17 22:56:44,195 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
2011-11-17 22:56:44,198 INFO org.apache.giraph.graph.BspService:
getApplicationAttempt: Node
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir already exists!
2011-11-17 22:56:44,204 INFO org.apache.giraph.graph.BspServiceWorker:
registerHealth: Created my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_201111172247_0003/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/asterix-001_8
and hostnamePort = ["asterix-001",30008]
2011-11-17 22:56:45,177 INFO org.apache.giraph.graph.BspService: process:
inputSplitsReadyChanged (input splits ready)
2011-11-17 22:56:45,192 WARN org.apache.giraph.graph.BspService: process:
Unknown and unprocessed event
(path=/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2/_inputSplitReserved,
type=NodeCreated, state=SyncConnected)
2011-11-17 22:56:45,192 INFO org.apache.giraph.graph.BspServiceWorker:
reserveInputSplit: Reserved input split path
/_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
2011-11-17 22:56:45,196 INFO org.apache.giraph.graph.BspServiceWorker:
loadVertices: Reserved /_hadoopBsp/job_201111172247_0003/_inputSplitsDir/2
from ZooKeeper and got input split
'hdfs://asterix-master:31888/webmap-tiny-sorted/part-00002:0+834285620'
2011-11-17 23:01:20,608 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 59117ms for sessionid
0x133b57675b60007, closing socket connection and attempting reconnect
2011-11-17 23:02:06,630 ERROR org.apache.zookeeper.ClientCnxn: Error while
calling watcher
java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot
recover.
        at org.apache.giraph.graph.BspService.process(BspService.java:990)
        at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
2011-11-17 23:02:35,793 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server asterix-010/10.0.0.10:22181
2011-11-17 23:02:35,794 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to asterix-010/10.0.0.10:22181, initiating session
2011-11-17 23:02:35,806 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x133b57675b60007 has expired,
closing socket connection

On Thu, Nov 17, 2011 at 9:46 PM, Avery Ching <ac...@apache.org> wrote:

>  Hi Yingyi,
>
> Here are some ideas you might want to try:
>
> 1)  Limit the thread stack size.
>
> 2  You can set the heap available to the mapper jvm.
>
> I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k)
> for the threads.
>
> -Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"
>
> Also, you might want to try using the EdgeListVertex instead of Vertex
> (i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit
> smaller.
>
> Let us know if that helps you.  You should also check to see if your
> Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you
> will be limited in how much heap you can use.
>
> Avery
>
>
> On 11/17/11 9:38 PM, Yingyi Bu wrote:
>
> Hi,
>
>     I'm running a Giraph PageRank job.  I tried with 8GB input text data
> over 10 nodes (each has 4 core,  4 disks,  and 12GB physical memory),  that
> is 800MB input-data/machine.    However,  Giraph job fails because of high
> GC costs and Out-of-Memory exception.
>      Do I set some special things in Hadoop configurations, for example,
>  maximum heap size for map task vm ?
>     Thanks!!
>
>  Best regards,
> Yingyi
>
>
>

Re: PageRank OOM Exception

Posted by Avery Ching <ac...@apache.org>.
Hi Yingyi,

Here are some ideas you might want to try:

1)  Limit the thread stack size.

2  You can set the heap available to the mapper jvm.

I.e. Here's a setting to get 10 GB of heap and use a smaller stack (64k) 
for the threads.

-Dmapred.child.java.opts="-Xms10g -Xmx10g -Xss64k"

Also, you might want to try using the EdgeListVertex instead of Vertex 
(i.e. GiraphJob.setVertexClass(EdgeListVertex.class)), it is quite a bit 
smaller.

Let us know if that helps you.  You should also check to see if your 
Hadoop installation is using a 32-bit of 64-bit JVM.  If it's 32-bit you 
will be limited in how much heap you can use.

Avery

On 11/17/11 9:38 PM, Yingyi Bu wrote:
> Hi,
>
>     I'm running a Giraph PageRank job.  I tried with 8GB input text 
> data over 10 nodes (each has 4 core,  4 disks,  and 12GB physical 
> memory),  that is 800MB input-data/machine.    However,  Giraph job 
> fails because of high GC costs and Out-of-Memory exception.
>     Do I set some special things in Hadoop configurations, for 
> example,  maximum heap size for map task vm ?
>     Thanks!!
>
> Best regards,
> Yingyi