You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Yingyi Bu <bu...@gmail.com> on 2014/01/23 05:10:55 UTC

Re: out of core option

I just run into the same issue with the latest trunk version.
Does anybody know how to fix it?

Best regards,
Yingyi


On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic <
sebastian.stipkovic@gmail.com> wrote:

> Hello,
>
> I have found a link, where someone describes the same problem:
>
> https://issues.apache.org/jira/browse/GIRAPH-788
>
> Does somebody can help me? Does out-of-core-options runs only on
> particular hadoop?
>
>
> Thanks,
> Sebastian
>
>
> 2013/12/6 Sebastian Stipkovic <se...@gmail.com>
>
>> Hi Rob,
>>
>> embarrassing. You are right. But now I get with the correct option the
>> following exception:
>>
>>
>> 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: Adding
>> task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip
>> task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/
>> 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO
>> org.apache.hadoop.mapred.TaskInProgress: Error from
>> attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run:
>> Caught an unrecoverable exception waitFor: ExecutionException occurred
>> while waiting for
>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
>> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by:
>> java.lang.IllegalStateException: waitFor: ExecutionException occurred while
>> waiting for
>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
>> at
>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
>> at
>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
>> at
>> org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
>> at
>> org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
>> at
>> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281)
>> at
>> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325)
>> at
>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>> at
>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244)
>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more
>> Caused by: java.util.concurrent.ExecutionException:
>> java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve
>> partition 0 at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at
>> java.util.concurrent.FutureTask.get(FutureTask.java:119) at
>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
>> at
>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
>> ... 16 more Caused by: java.lang.IllegalStateException:
>> getOrCreatePartition: cannot retrieve partition 0 at
>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
>> at
>> org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
>> at
>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
>> at
>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
>> at
>> org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
>> at
>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
>> at
>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
>> at
>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
>> at
>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:724) Caused by:
>> java.util.concurrent.ExecutionException: java.lang.NullPointerException at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at
>> java.util.concurrent.FutureTask.get(FutureTask.java:111) at
>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
>> ... 13 more Caused by: java.lang.NullPointerException at
>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
>> at
>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>> org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
>> at
>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
>> ... 14 more
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>> 2013/12/5 Rob Vesse <rv...@dotnetrdf.org>
>>
>>> Sebastian
>>>
>>> You've made a minor typo in the configuration setting which means you
>>> haven't actually enabled out of core graph mode.
>>>
>>> You have *giraph.useOutOfCoreGiraph *when it should be *giraph.useOutOfCoreGraph
>>> *– note that the last word is Graph not Giraph
>>>
>>> Rob
>>>
>>> From: Sebastian Stipkovic <se...@gmail.com>
>>> Reply-To: <us...@giraph.apache.org>
>>> Date: Thursday, 5 December 2013 20:39
>>> To: <us...@giraph.apache.org>
>>> Subject: out of core option
>>>
>>> Hello,
>>>
>>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1  on a single
>>> node cluster. It computes a tiny graph successful. But if the
>>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector)
>>> exception, although I had turned on the out-of-memory-option. The job
>>> with out-of-memory-option works only well with a tiny graph (0.9 GB).  What
>>> is Wrong? Does I have to do furthermore configurations?
>>>
>>> My Configurations are as follows:
>>>
>>>
>>> namevalue*fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem
>>> *mapred.task.cache.levels*2*giraph.vertexOutputFormatClass*
>>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat
>>> *hadoop.tmp.dir*/app/hadoop/tmp*hadoop.native.lib*true*map.sort.class*org.apache.hadoop.util.QuickSort
>>> *dfs.namenode.decommission.nodes.per.interval*5
>>> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000
>>> *dfs.datanode.data.dir.perm*755*mapred.system.dir*
>>> ${hadoop.tmp.dir}/mapred/system
>>> *mapred.job.tracker.persist.jobstatus.hours*0*dfs.datanode.address*
>>> 0.0.0.0:50010*dfs.namenode.logging.level*info
>>> *dfs.block.access.token.enable* false*io.skip.checksum.errors*false*fs.default.name
>>> <http://fs.default.name>* hdfs://localhost:54310
>>> *mapred.cluster.reduce.memory.mb*-1*mapred.child.tmp* ./tmp
>>> *fs.har.impl.disable.cache*true*dfs.safemode.threshold.pct*0.999f
>>> *mapred.skip.reduce.max.skip.groups*0*dfs.namenode.handler.count*10
>>> *dfs.blockreport.initialDelay* 0*mapred.heartbeats.in.second*100
>>> *mapred.tasktracker.dns.nameserver*default*io.sort.factor* 10
>>> *mapred.task.timeout*600000*giraph.maxWorkers*1
>>> *mapred.max.tracker.failures* 4*hadoop.rpc.socket.factory.class.default*
>>> org.apache.hadoop.net.StandardSocketFactory
>>> *mapred.job.tracker.jobhistory.lru.cache.size* 5*fs.hdfs.impl*
>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>> *mapred.queue.default.acl-administer-jobs* *
>>> *dfs.block.access.key.update.interval*600
>>> *mapred.skip.map.auto.incr.proc.count*true
>>> *mapreduce.job.complete.cancel.delegation.tokens*true
>>> *io.mapfile.bloom.size*1048576*mapreduce.reduce.shuffle.connect.timeout*
>>> 180000*dfs.safemode.extension*30000
>>> *mapred.jobtracker.blacklist.fault-timeout-window*180
>>> *tasktracker.http.threads*40*mapred.job.shuffle.merge.percent*0.66
>>> *mapreduce.inputformat.class* org.apache.giraph.bsp.BspInputFormat
>>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem*user.name
>>> <http://user.name>* hduser*mapred.output.compress*false
>>> *io.bytes.per.checksum*512*giraph.isStaticGraph* true
>>> *mapred.healthChecker.script.timeout*600000
>>> *topology.node.switch.mapping.impl*
>>> org.apache.hadoop.net.ScriptBasedMapping
>>> *dfs.https.server.keystore.resource*ssl-server.xml
>>> *mapred.reduce.slowstart.completed.maps*0.05
>>> *mapred.reduce.max.attempts*4*fs.ramfs.impl*
>>> org.apache.hadoop.fs.InMemoryFileSystem*dfs.block.access.token.lifetime*
>>> 600*dfs.name.edits.dir*${dfs.name.dir}*mapred.skip.map.max.skip.records*
>>> 0 *mapred.cluster.map.memory.mb*-1*hadoop.security.group.mapping*
>>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping
>>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo
>>> *mapred.jar*hdfs://localhost:54310
>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar
>>> *dfs.block.size*67108864*fs.s3.buffer.dir*${hadoop.tmp.dir}/s3
>>> *job.end.retry.attempts* 0*fs.file.impl*
>>> org.apache.hadoop.fs.LocalFileSystem*mapred.local.dir.minspacestart*0
>>> *mapred.output.compression.type*RECORD*dfs.datanode.ipc.address*
>>> 0.0.0.0:50020 *dfs.permissions*true*topology.script.number.args*100
>>> *io.mapfile.bloom.error.rate* 0.005*mapred.cluster.max.reduce.memory.mb*
>>> -1*mapred.max.tracker.blacklists*4 *mapred.task.profile.maps*0-2
>>> *dfs.datanode.https.address*0.0.0.0:50475 *mapred.userlog.retain.hours*
>>> 24*dfs.secondary.http.address*0.0.0.0:50090 *dfs.replication.max*512
>>> *mapred.job.tracker.persist.jobstatus.active*false
>>> *hadoop.security.authorization* false*local.cache.size*10737418240
>>> *dfs.namenode.delegation.token.renew-interval*86400000
>>> *mapred.min.split.size*0*mapred.map.tasks*2*mapred.child.java.opts*-Xmx4000m
>>> *mapreduce.job.counters.limit*120*dfs.https.client.keystore.resource*
>>> ssl-client.xml *mapred.job.queue.name <http://mapred.job.queue.name>*
>>> default*dfs.https.address*0.0.0.0:50470
>>> *mapred.job.tracker.retiredjobs.cache.size*1000
>>> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size* 128
>>> *mapred.inmem.merge.threshold*1000*job.end.retry.interval*30000
>>> *mapred.skip.attempts.to.start.skipping*2*fs.checkpoint.dir*
>>> ${hadoop.tmp.dir}/dfs/namesecondary*mapred.reduce.tasks* 0
>>> *mapred.merge.recordsBeforeProgress*10000*mapred.userlog.limit.kb*0
>>> *mapred.job.reduce.memory.mb*-1*dfs.max.objects*0
>>> *webinterface.private.actions*false *io.sort.spill.percent*0.80
>>> *mapred.job.shuffle.input.buffer.percent*0.70*mapred.job.name
>>> <http://mapred.job.name>* Giraph:
>>> org.apache.giraph.examples.MyShortestPaths*dfs.datanode.dns.nameserver*
>>> default*mapred.map.tasks.speculative.execution* false
>>> *hadoop.util.hash.type*murmur*dfs.blockreport.intervalMsec*3600000
>>> *mapred.map.max.attempts*0*mapreduce.job.acl-view-job*
>>> *dfs.client.block.write.retries* 3*mapred.job.tracker.handler.count*10
>>> *mapreduce.reduce.shuffle.read.timeout*180000
>>> *mapred.tasktracker.expiry.interval*600000*dfs.https.enable*false
>>> *mapred.jobtracker.maxtasks.per.job* -1
>>> *mapred.jobtracker.job.history.block.size*3145728
>>> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false
>>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat
>>> *dfs.datanode.failed.volumes.tolerated*0*ipc.client.tcpnodelay*false
>>> *mapred.task.profile.reduces* 0-2*mapred.output.compression.codec*
>>> org.apache.hadoop.io.compress.DefaultCodec*io.map.index.skip*0
>>> *mapred.working.dir*hdfs://localhost:54310/user/hduser
>>> *ipc.server.tcpnodelay* false
>>> *mapred.jobtracker.blacklist.fault-bucket-width*15
>>> *dfs.namenode.delegation.key.update-interval*86400000
>>> *mapred.used.genericoptionsparser*true*mapred.mapper.new-api*true
>>> *mapred.job.map.memory.mb* -1*giraph.vertex.input.dir*hdfs://localhost:
>>> 54310/user/hduser/output *dfs.default.chunk.view.size*32768
>>> *hadoop.logfile.size*10000000*mapred.reduce.tasks.speculative.execution*
>>> true*mapreduce.job.dir*hdfs://localhost:54310
>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001
>>> *mapreduce.tasktracker.outofband.heartbeat*false
>>> *mapreduce.reduce.input.limit*-1*dfs.datanode.du.reserved* 0
>>> *hadoop.security.authentication*simple*fs.checkpoint.period*3600
>>> *dfs.web.ugi*webuser,webgroup*mapred.job.reuse.jvm.num.tasks*1
>>> *mapred.jobtracker.completeuserjobs.maximum* 100*dfs.df.interval*60000
>>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data
>>> *mapred.task.tracker.task-controller*
>>> org.apache.hadoop.mapred.DefaultTaskController*giraph.minWorkers*1
>>> *fs.s3.maxRetries* 4*dfs.datanode.dns.interface*default
>>> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false
>>> *mapreduce.job.acl-modify-job*
>>> *dfs.permissions.supergroup* supergroup*mapred.local.dir*
>>> ${hadoop.tmp.dir}/mapred/local*fs.hftp.impl*
>>> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0
>>> *fs.s3.sleepTimeSeconds*10*dfs.replication.min* 1
>>> *mapred.submit.replication*10*fs.har.impl*
>>> org.apache.hadoop.fs.HarFileSystem*mapred.map.output.compression.codec*
>>> org.apache.hadoop.io.compress.DefaultCodec
>>> *mapred.tasktracker.dns.interface*default
>>> *dfs.namenode.decommission.interval* 30*dfs.http.address*0.0.0.0:50070
>>> *dfs.heartbeat.interval* 3*mapred.job.tracker*localhost:54311
>>> *mapreduce.job.submithost* hduser*io.seqfile.sorter.recordlimit*1000000
>>> *giraph.vertexInputFormatClass*
>>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat *dfs.name.dir*
>>> ${hadoop.tmp.dir}/dfs/name*mapred.line.input.format.linespermap*1
>>> *mapred.jobtracker.taskScheduler*
>>> org.apache.hadoop.mapred.JobQueueTaskScheduler
>>> *dfs.datanode.http.address*0.0.0.0:50075 *mapred.local.dir.minspacekill*
>>> 0*dfs.replication.interval*3*io.sort.record.percent* 0.05*fs.kfs.impl*
>>> org.apache.hadoop.fs.kfs.KosmosFileSystem*mapred.temp.dir*
>>> ${hadoop.tmp.dir}/mapred/temp *mapred.tasktracker.reduce.tasks.maximum*2
>>> *mapreduce.job.user.classpath.first*true*dfs.replication* 1
>>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}*giraph.computationClass*
>>> org.apache.giraph.examples.MyShortestPaths
>>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000
>>> *mapred.job.reduce.input.buffer.percent*0.0
>>> *mapred.tasktracker.indexcache.mb*10
>>> *mapreduce.job.split.metainfo.maxsize*10000000*hadoop.logfile.count* 10
>>> *mapred.skip.reduce.auto.incr.proc.count*true
>>> *mapreduce.job.submithostaddress*127.0.1.1
>>> *io.seqfile.compress.blocksize*1000000*fs.s3.block.size*67108864
>>> *mapred.tasktracker.taskmemorymanager.monitoring-interval* 5000
>>> *giraph.minPercentResponded*100.0*mapred.queue.default.state*RUNNING
>>> *mapred.acls.enabled*false*mapreduce.jobtracker.staging.root.dir*
>>> ${hadoop.tmp.dir}/mapred/staging*mapred.queue.names* default
>>> *dfs.access.time.precision*3600000*fs.hsftp.impl*
>>> org.apache.hadoop.hdfs.HsftpFileSystem
>>> *mapred.task.tracker.http.address*0.0.0.0:50060
>>> *mapred.reduce.parallel.copies* 5*io.seqfile.lazydecompress*true
>>> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100
>>> *ipc.client.connection.maxidletime*10000*mapred.compress.map.output*false
>>> *hadoop.security.uid.cache.secs*14400
>>> *mapred.task.tracker.report.address*127.0.0.1:0
>>> *mapred.healthChecker.interval*60000*ipc.client.kill.max*10
>>> *ipc.client.connect.max.retries* 10*ipc.ping.interval*300000
>>> *mapreduce.user.classpath.first*true *mapreduce.map.class*
>>> org.apache.giraph.graph.GraphMapper*fs.s3.impl*
>>> org.apache.hadoop.fs.s3.S3FileSystem*mapred.user.jobconf.limit* 5242880
>>> *mapred.job.tracker.http.address*0.0.0.0:50030*io.file.buffer.size* 4096
>>> *mapred.jobtracker.restart.recover*false*io.serializations*
>>> org.apache.hadoop.io.serializer.WritableSerialization
>>> *dfs.datanode.handler.count*3*mapred.reduce.copy.backoff*300
>>> *mapred.task.profile* false*dfs.replication.considerLoad*true
>>> *jobclient.output.filter*FAILED
>>> *dfs.namenode.delegation.token.max-lifetime*604800000
>>> *mapred.tasktracker.map.tasks.maximum*4*io.compression.codecs*
>>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
>>> *fs.checkpoint.size*67108864
>>>
>>> Additionally, if I have more than one worker I get an Exception, too?
>>> Are my configurations wrong?
>>>
>>>
>>> best regards,
>>> Sebastian
>>>
>>>
>>
>

Re: out of core option

Posted by Yingyi Bu <bu...@gmail.com>.

Claudio,

   Great, thanks!  Look forward to the fix!

Best regards,
Yingyi


On Thu, Jan 23, 2014 at 1:34 AM, Claudio Martella <
claudio.martella@gmail.com> wrote:

> Yep. there's a bug. Where're currently working on this for a fix. Should
> be ready in a few days.
>
>
> On Thu, Jan 23, 2014 at 5:10 AM, Yingyi Bu <bu...@gmail.com> wrote:
>
>> I just run into the same issue with the latest trunk version.
>> Does anybody know how to fix it?
>>
>> Best regards,
>> Yingyi
>>
>>
>> On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic <
>> sebastian.stipkovic@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have found a link, where someone describes the same problem:
>>>
>>> https://issues.apache.org/jira/browse/GIRAPH-788
>>>
>>> Does somebody can help me? Does out-of-core-options runs only on
>>> particular hadoop?
>>>
>>>
>>> Thanks,
>>> Sebastian
>>>
>>>
>>> 2013/12/6 Sebastian Stipkovic <se...@gmail.com>
>>>
>>>> Hi Rob,
>>>>
>>>> embarrassing. You are right. But now I get with the correct option the
>>>> following exception:
>>>>
>>>>
>>>> 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker:
>>>> Adding task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip
>>>> task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/
>>>> 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO
>>>> org.apache.hadoop.mapred.TaskInProgress: Error from
>>>> attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run:
>>>> Caught an unrecoverable exception waitFor: ExecutionException occurred
>>>> while waiting for
>>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
>>>> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by:
>>>> java.lang.IllegalStateException: waitFor: ExecutionException occurred while
>>>> waiting for
>>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
>>>> at
>>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
>>>> at
>>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
>>>> at
>>>> org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
>>>> at
>>>> org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
>>>> at
>>>> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281)
>>>> at
>>>> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325)
>>>> at
>>>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>>> at
>>>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244)
>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more
>>>> Caused by: java.util.concurrent.ExecutionException:
>>>> java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve
>>>> partition 0 at
>>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at
>>>> java.util.concurrent.FutureTask.get(FutureTask.java:119) at
>>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
>>>> at
>>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
>>>> ... 16 more Caused by: java.lang.IllegalStateException:
>>>> getOrCreatePartition: cannot retrieve partition 0 at
>>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
>>>> at
>>>> org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
>>>> at
>>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
>>>> at
>>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
>>>> at
>>>> org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
>>>> at
>>>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
>>>> at
>>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
>>>> at
>>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
>>>> at
>>>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:724) Caused by:
>>>> java.util.concurrent.ExecutionException: java.lang.NullPointerException at
>>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at
>>>> java.util.concurrent.FutureTask.get(FutureTask.java:111) at
>>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
>>>> ... 13 more Caused by: java.lang.NullPointerException at
>>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
>>>> at
>>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
>>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>>>> org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
>>>> at
>>>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
>>>> ... 14 more
>>>>
>>>>
>>>> Thanks,
>>>> Sebastian
>>>>
>>>>
>>>> 2013/12/5 Rob Vesse <rv...@dotnetrdf.org>
>>>>
>>>>> Sebastian
>>>>>
>>>>> You've made a minor typo in the configuration setting which means you
>>>>> haven't actually enabled out of core graph mode.
>>>>>
>>>>> You have *giraph.useOutOfCoreGiraph *when it should be *giraph.useOutOfCoreGraph
>>>>> *– note that the last word is Graph not Giraph
>>>>>
>>>>> Rob
>>>>>
>>>>> From: Sebastian Stipkovic <se...@gmail.com>
>>>>> Reply-To: <us...@giraph.apache.org>
>>>>> Date: Thursday, 5 December 2013 20:39
>>>>> To: <us...@giraph.apache.org>
>>>>> Subject: out of core option
>>>>>
>>>>> Hello,
>>>>>
>>>>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1  on a single
>>>>> node cluster. It computes a tiny graph successful. But if the
>>>>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector)
>>>>> exception, although I had turned on the out-of-memory-option. The job
>>>>> with out-of-memory-option works only well with a tiny graph (0.9 GB).  What
>>>>> is Wrong? Does I have to do furthermore configurations?
>>>>>
>>>>> My Configurations are as follows:
>>>>>
>>>>>
>>>>> namevalue*fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem
>>>>> *mapred.task.cache.levels*2*giraph.vertexOutputFormatClass*
>>>>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat
>>>>> *hadoop.tmp.dir*/app/hadoop/tmp*hadoop.native.lib*true*map.sort.class*org.apache.hadoop.util.QuickSort
>>>>> *dfs.namenode.decommission.nodes.per.interval*5
>>>>> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000
>>>>> *dfs.datanode.data.dir.perm*755*mapred.system.dir*
>>>>> ${hadoop.tmp.dir}/mapred/system
>>>>> *mapred.job.tracker.persist.jobstatus.hours*0*dfs.datanode.address*
>>>>> 0.0.0.0:50010*dfs.namenode.logging.level*info
>>>>> *dfs.block.access.token.enable* false*io.skip.checksum.errors*false*fs.default.name
>>>>> <http://fs.default.name>* hdfs://localhost:54310
>>>>> *mapred.cluster.reduce.memory.mb*-1*mapred.child.tmp* ./tmp
>>>>> *fs.har.impl.disable.cache*true*dfs.safemode.threshold.pct*0.999f
>>>>> *mapred.skip.reduce.max.skip.groups*0*dfs.namenode.handler.count*10
>>>>> *dfs.blockreport.initialDelay* 0*mapred.heartbeats.in.second*100
>>>>> *mapred.tasktracker.dns.nameserver*default*io.sort.factor* 10
>>>>> *mapred.task.timeout*600000*giraph.maxWorkers*1
>>>>> *mapred.max.tracker.failures* 4
>>>>> *hadoop.rpc.socket.factory.class.default*
>>>>> org.apache.hadoop.net.StandardSocketFactory
>>>>> *mapred.job.tracker.jobhistory.lru.cache.size* 5*fs.hdfs.impl*
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>>> *mapred.queue.default.acl-administer-jobs* *
>>>>> *dfs.block.access.key.update.interval*600
>>>>> *mapred.skip.map.auto.incr.proc.count*true
>>>>> *mapreduce.job.complete.cancel.delegation.tokens*true
>>>>> *io.mapfile.bloom.size*1048576
>>>>> *mapreduce.reduce.shuffle.connect.timeout* 180000
>>>>> *dfs.safemode.extension*30000
>>>>> *mapred.jobtracker.blacklist.fault-timeout-window*180
>>>>> *tasktracker.http.threads*40*mapred.job.shuffle.merge.percent*0.66
>>>>> *mapreduce.inputformat.class* org.apache.giraph.bsp.BspInputFormat
>>>>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem*user.name
>>>>> <http://user.name>* hduser*mapred.output.compress*false
>>>>> *io.bytes.per.checksum*512*giraph.isStaticGraph* true
>>>>> *mapred.healthChecker.script.timeout*600000
>>>>> *topology.node.switch.mapping.impl*
>>>>> org.apache.hadoop.net.ScriptBasedMapping
>>>>> *dfs.https.server.keystore.resource*ssl-server.xml
>>>>> *mapred.reduce.slowstart.completed.maps*0.05
>>>>> *mapred.reduce.max.attempts*4*fs.ramfs.impl*
>>>>> org.apache.hadoop.fs.InMemoryFileSystem
>>>>> *dfs.block.access.token.lifetime* 600*dfs.name.edits.dir*
>>>>> ${dfs.name.dir}*mapred.skip.map.max.skip.records*0
>>>>> *mapred.cluster.map.memory.mb*-1*hadoop.security.group.mapping*
>>>>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping
>>>>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo
>>>>> *mapred.jar*hdfs://localhost:54310
>>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar
>>>>> *dfs.block.size*67108864*fs.s3.buffer.dir*${hadoop.tmp.dir}/s3
>>>>> *job.end.retry.attempts* 0*fs.file.impl*
>>>>> org.apache.hadoop.fs.LocalFileSystem*mapred.local.dir.minspacestart*0
>>>>> *mapred.output.compression.type*RECORD*dfs.datanode.ipc.address*
>>>>> 0.0.0.0:50020 *dfs.permissions*true*topology.script.number.args*100
>>>>> *io.mapfile.bloom.error.rate* 0.005
>>>>> *mapred.cluster.max.reduce.memory.mb*-1*mapred.max.tracker.blacklists*
>>>>> 4 *mapred.task.profile.maps*0-2*dfs.datanode.https.address*
>>>>> 0.0.0.0:50475 *mapred.userlog.retain.hours*24
>>>>> *dfs.secondary.http.address*0.0.0.0:50090 *dfs.replication.max*512
>>>>> *mapred.job.tracker.persist.jobstatus.active*false
>>>>> *hadoop.security.authorization* false*local.cache.size*10737418240
>>>>> *dfs.namenode.delegation.token.renew-interval*86400000
>>>>> *mapred.min.split.size*0*mapred.map.tasks*2*mapred.child.java.opts*-Xmx4000m
>>>>> *mapreduce.job.counters.limit*120*dfs.https.client.keystore.resource*
>>>>> ssl-client.xml *mapred.job.queue.name <http://mapred.job.queue.name>*
>>>>> default*dfs.https.address*0.0.0.0:50470
>>>>> *mapred.job.tracker.retiredjobs.cache.size*1000
>>>>> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size*
>>>>> 128*mapred.inmem.merge.threshold*1000*job.end.retry.interval*30000
>>>>> *mapred.skip.attempts.to.start.skipping*2*fs.checkpoint.dir*
>>>>> ${hadoop.tmp.dir}/dfs/namesecondary*mapred.reduce.tasks* 0
>>>>> *mapred.merge.recordsBeforeProgress*10000*mapred.userlog.limit.kb*0
>>>>> *mapred.job.reduce.memory.mb*-1*dfs.max.objects*0
>>>>> *webinterface.private.actions*false *io.sort.spill.percent*0.80
>>>>> *mapred.job.shuffle.input.buffer.percent*0.70*mapred.job.name
>>>>> <http://mapred.job.name>* Giraph:
>>>>> org.apache.giraph.examples.MyShortestPaths
>>>>> *dfs.datanode.dns.nameserver*default
>>>>> *mapred.map.tasks.speculative.execution* false*hadoop.util.hash.type*
>>>>> murmur*dfs.blockreport.intervalMsec*3600000 *mapred.map.max.attempts*0
>>>>> *mapreduce.job.acl-view-job*
>>>>> *dfs.client.block.write.retries* 3*mapred.job.tracker.handler.count*10
>>>>> *mapreduce.reduce.shuffle.read.timeout*180000
>>>>> *mapred.tasktracker.expiry.interval*600000*dfs.https.enable*false
>>>>> *mapred.jobtracker.maxtasks.per.job* -1
>>>>> *mapred.jobtracker.job.history.block.size*3145728
>>>>> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false
>>>>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat
>>>>> *dfs.datanode.failed.volumes.tolerated*0*ipc.client.tcpnodelay*false
>>>>> *mapred.task.profile.reduces* 0-2*mapred.output.compression.codec*
>>>>> org.apache.hadoop.io.compress.DefaultCodec*io.map.index.skip*0
>>>>> *mapred.working.dir*hdfs://localhost:54310/user/hduser
>>>>> *ipc.server.tcpnodelay* false
>>>>> *mapred.jobtracker.blacklist.fault-bucket-width*15
>>>>> *dfs.namenode.delegation.key.update-interval*86400000
>>>>> *mapred.used.genericoptionsparser*true*mapred.mapper.new-api*true
>>>>> *mapred.job.map.memory.mb* -1*giraph.vertex.input.dir*
>>>>> hdfs://localhost:54310/user/hduser/output
>>>>> *dfs.default.chunk.view.size*32768*hadoop.logfile.size*10000000
>>>>> *mapred.reduce.tasks.speculative.execution* true*mapreduce.job.dir*
>>>>> hdfs://localhost:54310
>>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001
>>>>> *mapreduce.tasktracker.outofband.heartbeat*false
>>>>> *mapreduce.reduce.input.limit*-1*dfs.datanode.du.reserved* 0
>>>>> *hadoop.security.authentication*simple*fs.checkpoint.period*3600
>>>>> *dfs.web.ugi*webuser,webgroup*mapred.job.reuse.jvm.num.tasks*1
>>>>> *mapred.jobtracker.completeuserjobs.maximum* 100*dfs.df.interval*60000
>>>>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data
>>>>> *mapred.task.tracker.task-controller*
>>>>> org.apache.hadoop.mapred.DefaultTaskController*giraph.minWorkers*1
>>>>> *fs.s3.maxRetries* 4*dfs.datanode.dns.interface*default
>>>>> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false
>>>>> *mapreduce.job.acl-modify-job*
>>>>> *dfs.permissions.supergroup* supergroup*mapred.local.dir*
>>>>> ${hadoop.tmp.dir}/mapred/local*fs.hftp.impl*
>>>>> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0
>>>>> *fs.s3.sleepTimeSeconds*10*dfs.replication.min* 1
>>>>> *mapred.submit.replication*10*fs.har.impl*
>>>>> org.apache.hadoop.fs.HarFileSystem
>>>>> *mapred.map.output.compression.codec*
>>>>> org.apache.hadoop.io.compress.DefaultCodec
>>>>> *mapred.tasktracker.dns.interface*default
>>>>> *dfs.namenode.decommission.interval* 30*dfs.http.address*0.0.0.0:50070
>>>>> *dfs.heartbeat.interval* 3*mapred.job.tracker*localhost:54311
>>>>> *mapreduce.job.submithost* hduser*io.seqfile.sorter.recordlimit*
>>>>> 1000000*giraph.vertexInputFormatClass*
>>>>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat
>>>>> *dfs.name.dir*${hadoop.tmp.dir}/dfs/name
>>>>> *mapred.line.input.format.linespermap*1
>>>>> *mapred.jobtracker.taskScheduler*
>>>>> org.apache.hadoop.mapred.JobQueueTaskScheduler
>>>>> *dfs.datanode.http.address*0.0.0.0:50075
>>>>> *mapred.local.dir.minspacekill*0*dfs.replication.interval*3
>>>>> *io.sort.record.percent* 0.05*fs.kfs.impl*
>>>>> org.apache.hadoop.fs.kfs.KosmosFileSystem*mapred.temp.dir*
>>>>> ${hadoop.tmp.dir}/mapred/temp
>>>>> *mapred.tasktracker.reduce.tasks.maximum*2
>>>>> *mapreduce.job.user.classpath.first*true*dfs.replication* 1
>>>>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}*giraph.computationClass*
>>>>> org.apache.giraph.examples.MyShortestPaths
>>>>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000
>>>>> *mapred.job.reduce.input.buffer.percent*0.0
>>>>> *mapred.tasktracker.indexcache.mb*10
>>>>> *mapreduce.job.split.metainfo.maxsize*10000000*hadoop.logfile.count*
>>>>> 10*mapred.skip.reduce.auto.incr.proc.count*true
>>>>> *mapreduce.job.submithostaddress*127.0.1.1
>>>>> *io.seqfile.compress.blocksize*1000000*fs.s3.block.size*67108864
>>>>> *mapred.tasktracker.taskmemorymanager.monitoring-interval* 5000
>>>>> *giraph.minPercentResponded*100.0*mapred.queue.default.state*RUNNING
>>>>> *mapred.acls.enabled*false*mapreduce.jobtracker.staging.root.dir*
>>>>> ${hadoop.tmp.dir}/mapred/staging*mapred.queue.names* default
>>>>> *dfs.access.time.precision*3600000*fs.hsftp.impl*
>>>>> org.apache.hadoop.hdfs.HsftpFileSystem
>>>>> *mapred.task.tracker.http.address*0.0.0.0:50060
>>>>> *mapred.reduce.parallel.copies* 5*io.seqfile.lazydecompress*true
>>>>> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100
>>>>> *ipc.client.connection.maxidletime*10000*mapred.compress.map.output*false
>>>>> *hadoop.security.uid.cache.secs*14400
>>>>> *mapred.task.tracker.report.address*127.0.0.1:0
>>>>> *mapred.healthChecker.interval*60000*ipc.client.kill.max*10
>>>>> *ipc.client.connect.max.retries* 10*ipc.ping.interval*300000
>>>>> *mapreduce.user.classpath.first*true *mapreduce.map.class*
>>>>> org.apache.giraph.graph.GraphMapper*fs.s3.impl*
>>>>> org.apache.hadoop.fs.s3.S3FileSystem*mapred.user.jobconf.limit*
>>>>> 5242880*mapred.job.tracker.http.address*0.0.0.0:50030
>>>>> *io.file.buffer.size* 4096*mapred.jobtracker.restart.recover*false
>>>>> *io.serializations*
>>>>> org.apache.hadoop.io.serializer.WritableSerialization
>>>>> *dfs.datanode.handler.count*3*mapred.reduce.copy.backoff*300
>>>>> *mapred.task.profile* false*dfs.replication.considerLoad*true
>>>>> *jobclient.output.filter*FAILED
>>>>> *dfs.namenode.delegation.token.max-lifetime*604800000
>>>>> *mapred.tasktracker.map.tasks.maximum*4*io.compression.codecs*
>>>>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
>>>>> *fs.checkpoint.size*67108864
>>>>>
>>>>> Additionally, if I have more than one worker I get an Exception, too?
>>>>> Are my configurations wrong?
>>>>>
>>>>>
>>>>> best regards,
>>>>> Sebastian
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com
>

Re: out of core option

Posted by Claudio Martella <cl...@gmail.com>.

Yep. there's a bug. Where're currently working on this for a fix. Should be
ready in a few days.


On Thu, Jan 23, 2014 at 5:10 AM, Yingyi Bu <bu...@gmail.com> wrote:

> I just run into the same issue with the latest trunk version.
> Does anybody know how to fix it?
>
> Best regards,
> Yingyi
>
>
> On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic <
> sebastian.stipkovic@gmail.com> wrote:
>
>> Hello,
>>
>> I have found a link, where someone describes the same problem:
>>
>> https://issues.apache.org/jira/browse/GIRAPH-788
>>
>> Does somebody can help me? Does out-of-core-options runs only on
>> particular hadoop?
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>> 2013/12/6 Sebastian Stipkovic <se...@gmail.com>
>>
>>> Hi Rob,
>>>
>>> embarrassing. You are right. But now I get with the correct option the
>>> following exception:
>>>
>>>
>>> 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: Adding
>>> task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip
>>> task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/
>>> 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO
>>> org.apache.hadoop.mapred.TaskInProgress: Error from
>>> attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run:
>>> Caught an unrecoverable exception waitFor: ExecutionException occurred
>>> while waiting for
>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at
>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at
>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
>>> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
>>> java.security.AccessController.doPrivileged(Native Method) at
>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by:
>>> java.lang.IllegalStateException: waitFor: ExecutionException occurred while
>>> waiting for
>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
>>> at
>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
>>> at
>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
>>> at
>>> org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
>>> at
>>> org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
>>> at
>>> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281)
>>> at
>>> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325)
>>> at
>>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
>>> at
>>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244)
>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more
>>> Caused by: java.util.concurrent.ExecutionException:
>>> java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve
>>> partition 0 at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at
>>> java.util.concurrent.FutureTask.get(FutureTask.java:119) at
>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
>>> at
>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
>>> ... 16 more Caused by: java.lang.IllegalStateException:
>>> getOrCreatePartition: cannot retrieve partition 0 at
>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
>>> at
>>> org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
>>> at
>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
>>> at
>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
>>> at
>>> org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
>>> at
>>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
>>> at
>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
>>> at
>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
>>> at
>>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:724) Caused by:
>>> java.util.concurrent.ExecutionException: java.lang.NullPointerException at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at
>>> java.util.concurrent.FutureTask.get(FutureTask.java:111) at
>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
>>> ... 13 more Caused by: java.lang.NullPointerException at
>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
>>> at
>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at
>>> org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
>>> at
>>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
>>> ... 14 more
>>>
>>>
>>> Thanks,
>>> Sebastian
>>>
>>>
>>> 2013/12/5 Rob Vesse <rv...@dotnetrdf.org>
>>>
>>>> Sebastian
>>>>
>>>> You've made a minor typo in the configuration setting which means you
>>>> haven't actually enabled out of core graph mode.
>>>>
>>>> You have *giraph.useOutOfCoreGiraph *when it should be *giraph.useOutOfCoreGraph
>>>> *– note that the last word is Graph not Giraph
>>>>
>>>> Rob
>>>>
>>>> From: Sebastian Stipkovic <se...@gmail.com>
>>>> Reply-To: <us...@giraph.apache.org>
>>>> Date: Thursday, 5 December 2013 20:39
>>>> To: <us...@giraph.apache.org>
>>>> Subject: out of core option
>>>>
>>>> Hello,
>>>>
>>>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1  on a single
>>>> node cluster. It computes a tiny graph successful. But if the
>>>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector)
>>>> exception, although I had turned on the out-of-memory-option. The job
>>>> with out-of-memory-option works only well with a tiny graph (0.9 GB).  What
>>>> is Wrong? Does I have to do furthermore configurations?
>>>>
>>>> My Configurations are as follows:
>>>>
>>>>
>>>> namevalue*fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem
>>>> *mapred.task.cache.levels*2*giraph.vertexOutputFormatClass*
>>>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat
>>>> *hadoop.tmp.dir*/app/hadoop/tmp*hadoop.native.lib*true*map.sort.class*org.apache.hadoop.util.QuickSort
>>>> *dfs.namenode.decommission.nodes.per.interval*5
>>>> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000
>>>> *dfs.datanode.data.dir.perm*755*mapred.system.dir*
>>>> ${hadoop.tmp.dir}/mapred/system
>>>> *mapred.job.tracker.persist.jobstatus.hours*0*dfs.datanode.address*
>>>> 0.0.0.0:50010*dfs.namenode.logging.level*info
>>>> *dfs.block.access.token.enable* false*io.skip.checksum.errors*false*fs.default.name
>>>> <http://fs.default.name>* hdfs://localhost:54310
>>>> *mapred.cluster.reduce.memory.mb*-1*mapred.child.tmp* ./tmp
>>>> *fs.har.impl.disable.cache*true*dfs.safemode.threshold.pct*0.999f
>>>> *mapred.skip.reduce.max.skip.groups*0*dfs.namenode.handler.count*10
>>>> *dfs.blockreport.initialDelay* 0*mapred.heartbeats.in.second*100
>>>> *mapred.tasktracker.dns.nameserver*default*io.sort.factor* 10
>>>> *mapred.task.timeout*600000*giraph.maxWorkers*1
>>>> *mapred.max.tracker.failures* 4
>>>> *hadoop.rpc.socket.factory.class.default*
>>>> org.apache.hadoop.net.StandardSocketFactory
>>>> *mapred.job.tracker.jobhistory.lru.cache.size* 5*fs.hdfs.impl*
>>>> org.apache.hadoop.hdfs.DistributedFileSystem
>>>> *mapred.queue.default.acl-administer-jobs* *
>>>> *dfs.block.access.key.update.interval*600
>>>> *mapred.skip.map.auto.incr.proc.count*true
>>>> *mapreduce.job.complete.cancel.delegation.tokens*true
>>>> *io.mapfile.bloom.size*1048576
>>>> *mapreduce.reduce.shuffle.connect.timeout* 180000
>>>> *dfs.safemode.extension*30000
>>>> *mapred.jobtracker.blacklist.fault-timeout-window*180
>>>> *tasktracker.http.threads*40*mapred.job.shuffle.merge.percent*0.66
>>>> *mapreduce.inputformat.class* org.apache.giraph.bsp.BspInputFormat
>>>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem*user.name
>>>> <http://user.name>* hduser*mapred.output.compress*false
>>>> *io.bytes.per.checksum*512*giraph.isStaticGraph* true
>>>> *mapred.healthChecker.script.timeout*600000
>>>> *topology.node.switch.mapping.impl*
>>>> org.apache.hadoop.net.ScriptBasedMapping
>>>> *dfs.https.server.keystore.resource*ssl-server.xml
>>>> *mapred.reduce.slowstart.completed.maps*0.05
>>>> *mapred.reduce.max.attempts*4*fs.ramfs.impl*
>>>> org.apache.hadoop.fs.InMemoryFileSystem
>>>> *dfs.block.access.token.lifetime* 600*dfs.name.edits.dir*
>>>> ${dfs.name.dir}*mapred.skip.map.max.skip.records*0
>>>> *mapred.cluster.map.memory.mb*-1*hadoop.security.group.mapping*
>>>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping
>>>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo
>>>> *mapred.jar*hdfs://localhost:54310
>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar
>>>> *dfs.block.size*67108864*fs.s3.buffer.dir*${hadoop.tmp.dir}/s3
>>>> *job.end.retry.attempts* 0*fs.file.impl*
>>>> org.apache.hadoop.fs.LocalFileSystem*mapred.local.dir.minspacestart*0
>>>> *mapred.output.compression.type*RECORD*dfs.datanode.ipc.address*
>>>> 0.0.0.0:50020 *dfs.permissions*true*topology.script.number.args*100
>>>> *io.mapfile.bloom.error.rate* 0.005
>>>> *mapred.cluster.max.reduce.memory.mb*-1*mapred.max.tracker.blacklists*4
>>>> *mapred.task.profile.maps*0-2*dfs.datanode.https.address*0.0.0.0:50475
>>>> *mapred.userlog.retain.hours*24*dfs.secondary.http.address*
>>>> 0.0.0.0:50090 *dfs.replication.max*512
>>>> *mapred.job.tracker.persist.jobstatus.active*false
>>>> *hadoop.security.authorization* false*local.cache.size*10737418240
>>>> *dfs.namenode.delegation.token.renew-interval*86400000
>>>> *mapred.min.split.size*0*mapred.map.tasks*2*mapred.child.java.opts*-Xmx4000m
>>>> *mapreduce.job.counters.limit*120*dfs.https.client.keystore.resource*
>>>> ssl-client.xml *mapred.job.queue.name <http://mapred.job.queue.name>*
>>>> default*dfs.https.address*0.0.0.0:50470
>>>> *mapred.job.tracker.retiredjobs.cache.size*1000
>>>> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size* 128
>>>> *mapred.inmem.merge.threshold*1000*job.end.retry.interval*30000
>>>> *mapred.skip.attempts.to.start.skipping*2*fs.checkpoint.dir*
>>>> ${hadoop.tmp.dir}/dfs/namesecondary*mapred.reduce.tasks* 0
>>>> *mapred.merge.recordsBeforeProgress*10000*mapred.userlog.limit.kb*0
>>>> *mapred.job.reduce.memory.mb*-1*dfs.max.objects*0
>>>> *webinterface.private.actions*false *io.sort.spill.percent*0.80
>>>> *mapred.job.shuffle.input.buffer.percent*0.70*mapred.job.name
>>>> <http://mapred.job.name>* Giraph:
>>>> org.apache.giraph.examples.MyShortestPaths*dfs.datanode.dns.nameserver*
>>>> default*mapred.map.tasks.speculative.execution* false
>>>> *hadoop.util.hash.type*murmur*dfs.blockreport.intervalMsec*3600000
>>>> *mapred.map.max.attempts*0*mapreduce.job.acl-view-job*
>>>> *dfs.client.block.write.retries* 3*mapred.job.tracker.handler.count*10
>>>> *mapreduce.reduce.shuffle.read.timeout*180000
>>>> *mapred.tasktracker.expiry.interval*600000*dfs.https.enable*false
>>>> *mapred.jobtracker.maxtasks.per.job* -1
>>>> *mapred.jobtracker.job.history.block.size*3145728
>>>> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false
>>>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat
>>>> *dfs.datanode.failed.volumes.tolerated*0*ipc.client.tcpnodelay*false
>>>> *mapred.task.profile.reduces* 0-2*mapred.output.compression.codec*
>>>> org.apache.hadoop.io.compress.DefaultCodec*io.map.index.skip*0
>>>> *mapred.working.dir*hdfs://localhost:54310/user/hduser
>>>> *ipc.server.tcpnodelay* false
>>>> *mapred.jobtracker.blacklist.fault-bucket-width*15
>>>> *dfs.namenode.delegation.key.update-interval*86400000
>>>> *mapred.used.genericoptionsparser*true*mapred.mapper.new-api*true
>>>> *mapred.job.map.memory.mb* -1*giraph.vertex.input.dir*hdfs://localhost:
>>>> 54310/user/hduser/output *dfs.default.chunk.view.size*32768
>>>> *hadoop.logfile.size*10000000
>>>> *mapred.reduce.tasks.speculative.execution* true*mapreduce.job.dir*
>>>> hdfs://localhost:54310
>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001
>>>> *mapreduce.tasktracker.outofband.heartbeat*false
>>>> *mapreduce.reduce.input.limit*-1*dfs.datanode.du.reserved* 0
>>>> *hadoop.security.authentication*simple*fs.checkpoint.period*3600
>>>> *dfs.web.ugi*webuser,webgroup*mapred.job.reuse.jvm.num.tasks*1
>>>> *mapred.jobtracker.completeuserjobs.maximum* 100*dfs.df.interval*60000
>>>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data
>>>> *mapred.task.tracker.task-controller*
>>>> org.apache.hadoop.mapred.DefaultTaskController*giraph.minWorkers*1
>>>> *fs.s3.maxRetries* 4*dfs.datanode.dns.interface*default
>>>> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false
>>>> *mapreduce.job.acl-modify-job*
>>>> *dfs.permissions.supergroup* supergroup*mapred.local.dir*
>>>> ${hadoop.tmp.dir}/mapred/local*fs.hftp.impl*
>>>> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0
>>>> *fs.s3.sleepTimeSeconds*10*dfs.replication.min* 1
>>>> *mapred.submit.replication*10*fs.har.impl*
>>>> org.apache.hadoop.fs.HarFileSystem*mapred.map.output.compression.codec*
>>>> org.apache.hadoop.io.compress.DefaultCodec
>>>> *mapred.tasktracker.dns.interface*default
>>>> *dfs.namenode.decommission.interval* 30*dfs.http.address*0.0.0.0:50070
>>>> *dfs.heartbeat.interval* 3*mapred.job.tracker*localhost:54311
>>>> *mapreduce.job.submithost* hduser*io.seqfile.sorter.recordlimit*1000000
>>>> *giraph.vertexInputFormatClass*
>>>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat *dfs.name.dir*
>>>> ${hadoop.tmp.dir}/dfs/name*mapred.line.input.format.linespermap*1
>>>> *mapred.jobtracker.taskScheduler*
>>>> org.apache.hadoop.mapred.JobQueueTaskScheduler
>>>> *dfs.datanode.http.address*0.0.0.0:50075
>>>> *mapred.local.dir.minspacekill*0*dfs.replication.interval*3
>>>> *io.sort.record.percent* 0.05*fs.kfs.impl*
>>>> org.apache.hadoop.fs.kfs.KosmosFileSystem*mapred.temp.dir*
>>>> ${hadoop.tmp.dir}/mapred/temp *mapred.tasktracker.reduce.tasks.maximum*
>>>> 2*mapreduce.job.user.classpath.first*true*dfs.replication* 1
>>>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}*giraph.computationClass*
>>>> org.apache.giraph.examples.MyShortestPaths
>>>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000
>>>> *mapred.job.reduce.input.buffer.percent*0.0
>>>> *mapred.tasktracker.indexcache.mb*10
>>>> *mapreduce.job.split.metainfo.maxsize*10000000*hadoop.logfile.count* 10
>>>> *mapred.skip.reduce.auto.incr.proc.count*true
>>>> *mapreduce.job.submithostaddress*127.0.1.1
>>>> *io.seqfile.compress.blocksize*1000000*fs.s3.block.size*67108864
>>>> *mapred.tasktracker.taskmemorymanager.monitoring-interval* 5000
>>>> *giraph.minPercentResponded*100.0*mapred.queue.default.state*RUNNING
>>>> *mapred.acls.enabled*false*mapreduce.jobtracker.staging.root.dir*
>>>> ${hadoop.tmp.dir}/mapred/staging*mapred.queue.names* default
>>>> *dfs.access.time.precision*3600000*fs.hsftp.impl*
>>>> org.apache.hadoop.hdfs.HsftpFileSystem
>>>> *mapred.task.tracker.http.address*0.0.0.0:50060
>>>> *mapred.reduce.parallel.copies* 5*io.seqfile.lazydecompress*true
>>>> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100
>>>> *ipc.client.connection.maxidletime*10000*mapred.compress.map.output*false
>>>> *hadoop.security.uid.cache.secs*14400
>>>> *mapred.task.tracker.report.address*127.0.0.1:0
>>>> *mapred.healthChecker.interval*60000*ipc.client.kill.max*10
>>>> *ipc.client.connect.max.retries* 10*ipc.ping.interval*300000
>>>> *mapreduce.user.classpath.first*true *mapreduce.map.class*
>>>> org.apache.giraph.graph.GraphMapper*fs.s3.impl*
>>>> org.apache.hadoop.fs.s3.S3FileSystem*mapred.user.jobconf.limit* 5242880
>>>> *mapred.job.tracker.http.address*0.0.0.0:50030*io.file.buffer.size*
>>>> 4096*mapred.jobtracker.restart.recover*false*io.serializations*
>>>> org.apache.hadoop.io.serializer.WritableSerialization
>>>> *dfs.datanode.handler.count*3*mapred.reduce.copy.backoff*300
>>>> *mapred.task.profile* false*dfs.replication.considerLoad*true
>>>> *jobclient.output.filter*FAILED
>>>> *dfs.namenode.delegation.token.max-lifetime*604800000
>>>> *mapred.tasktracker.map.tasks.maximum*4*io.compression.codecs*
>>>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
>>>> *fs.checkpoint.size*67108864
>>>>
>>>> Additionally, if I have more than one worker I get an Exception, too?
>>>> Are my configurations wrong?
>>>>
>>>>
>>>> best regards,
>>>> Sebastian
>>>>
>>>>
>>>
>>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com