You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by RainShine79 <ra...@googlemail.com> on 2014/10/23 14:02:54 UTC

Giraph job can not finish last superstep

  Hello all,


i have a giraph job which seems to executed successfully: in the logs and on the hadoop webinterface i can see that all supersteps are executed successfully. The only problem i got is that the output seems to not get written to hdfs. 


As far as i know from personal research from prior postings on this mailing list, there is some problem with 
a) the out-of-core feature which i need to use to be able to load all the data 
and
b) the output of the results to hdfs. 


I currently use the latest stable version 1.0.0.


Here is the log of one exemplary worker:
2014-10-23 13:42:10,107 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
2014-10-23 13:42:10,108 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 56 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-56_vertices
2014-10-23 13:42:10,270 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 0 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-0_vertices
2014-10-23 13:42:10,435 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 16 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-16_vertices
2014-10-23 13:42:10,600 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 32 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-32_vertices
2014-10-23 13:42:10,761 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 48 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-48_vertices
2014-10-23 13:42:10,927 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 8 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-8_vertices
2014-10-23 13:42:11,245 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 24 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-24_vertices
2014-10-23 13:42:11,432 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 40 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-40_vertices
2014-10-23 13:42:11,619 INFO org.apache.giraph.graph.ComputeCallable: call: Computation took 1.5131937 secs for 8 partitions on superstep 2.  Flushing started
2014-10-23 13:42:11,620 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 1107.35M / 1358.25M / 9344.00M
2014-10-23 13:42:11,621 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0005, MBytes/sec received = 0.0001, MBytesSent = 0.0007, MBytesReceived = 0.0001, ave sent req MBytes = 0.0001, ave received req MBytes = 0, secs waited = 1.519
2014-10-23 13:42:11,621 INFO org.apache.giraph.worker.WorkerAggregatorHandler: finishSuperstep: Start gathering aggregators, workers will send their aggregated values once they are done with superstep computation
2014-10-23 13:42:11,834 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0119, MBytes/sec received = 0.0062, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, secs waited = 0.002
2014-10-23 13:42:11,834 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Superstep 2, messages = 0 Memory (free/total/max) = 1105.09M / 1358.25M / 9344.00M
2014-10-23 13:42:11,869 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: (waiting for rest of workers) WORKER_ONLY - Attempt=0, Superstep=2
2014-10-23 13:42:11,887 INFO org.apache.giraph.bsp.BspService: process: superstepFinished signaled
2014-10-23 13:42:11,895 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Completed superstep 2 with global stats (vtx=538312,finVtx=0,edges=35261,msgCount=35261,haltComputation=true)
2014-10-23 13:42:11,895 INFO org.apache.giraph.graph.GraphTaskManager: execute: BSP application done (global vertices marked done)
2014-10-23 13:42:11,896 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for WORKER_ONLY
2014-10-23 13:42:11,903 INFO org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 8 connections closed, releasing NettyClient.bootstrap resources now.
2014-10-23 13:42:11,905 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices: Starting to save 66998 vertices using 1 threads
2014-10-23 13:42:11,987 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201410130927_0282/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected)
2014-10-23 13:42:11,994 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 56 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-56_vertices
2014-10-23 13:42:12,003 INFO org.apache.giraph.worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions)
2014-10-23 13:42:12,128 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201410130927_0282/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished, type=NodeDeleted, state=SyncConnected)
2014-10-23 13:42:12,229 INFO org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart
2014-10-23 13:42:12,245 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201410130927_0282/_masterJobState)
2014-10-23 13:43:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728
2014-10-23 13:43:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@25b43d0d
2014-10-23 13:44:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728
2014-10-23 13:44:11,908 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@25b43d0d
2014-10-23 13:45:11,908 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728


… this continues forever. 



Is there some patch i can use to fix the issue or do i have to work on the current trunk? In case i have to use the most recent sources: how are the new interfaces (abstract classes) called which i need to implement (extend)? 



Thanks for your help in advance,
Frank


Re: Giraph job can not finish last superstep

Posted by RainShine <ra...@googlemail.com>.
  
  

Hello again,


its sad to not find any solutions for this problem. I already applied several patches which at least looked like they might be promising including the ones from Giraph-806 (https://issues.apache.org/jira/browse/GIRAPH-806). I also tried to use the version from the git repository and found that functionality just as the „getCurrentSuperstep()“ is missing there which i use in all my algorithmns and vertex implementations.


I also tried compiling with different maven profiles as our hadoop version (1.1.1) which we use in out cluster is never directly addressed as compatible.


Funny thing is, that the algortihmn works fine when i reduce the input size and do not use out-of-core graphs.


I would love to use giraph, but this issue is eating time like nothing. So if anybody knows something that could help: i would really appreciate this.


Best regards,
Frank



> On Oct 23, 2014, at 2:02 PM, RainShine79 <ra...@googlemail.com> wrote:
> 
> 
> Hello all,
> 
> 
> i have a giraph job which seems to executed successfully: in the logs and on the hadoop webinterface i can see that all supersteps are executed successfully. The only problem i got is that the output seems to not get written to hdfs. 
> 
> 
> As far as i know from personal research from prior postings on this mailing list, there is some problem with 
> a) the out-of-core feature which i need to use to be able to load all the data 
> and
> b) the output of the results to hdfs. 
> 
> 
> I currently use the latest stable version 1.0.0.
> 
> 
> Here is the log of one exemplary worker:
> 2014-10-23 13:42:10,107 INFO org.apache.giraph.comm.SendPartitionCache: SendPartitionCache: maxEdgesPerTransfer = 80000
> 2014-10-23 13:42:10,108 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 56 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-56_vertices
> 2014-10-23 13:42:10,270 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 0 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-0_vertices
> 2014-10-23 13:42:10,435 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 16 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-16_vertices
> 2014-10-23 13:42:10,600 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 32 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-32_vertices
> 2014-10-23 13:42:10,761 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 48 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-48_vertices
> 2014-10-23 13:42:10,927 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 8 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-8_vertices
> 2014-10-23 13:42:11,245 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 24 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-24_vertices
> 2014-10-23 13:42:11,432 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 40 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-40_vertices
> 2014-10-23 13:42:11,619 INFO org.apache.giraph.graph.ComputeCallable: call: Computation took 1.5131937 secs for 8 partitions on superstep 2.  Flushing started
> 2014-10-23 13:42:11,620 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Waiting on all requests, superstep 2 Memory (free/total/max) = 1107.35M / 1358.25M / 9344.00M
> 2014-10-23 13:42:11,621 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0005, MBytes/sec received = 0.0001, MBytesSent = 0.0007, MBytesReceived = 0.0001, ave sent req MBytes = 0.0001, ave received req MBytes = 0, secs waited = 1.519
> 2014-10-23 13:42:11,621 INFO org.apache.giraph.worker.WorkerAggregatorHandler: finishSuperstep: Start gathering aggregators, workers will send their aggregated values once they are done with superstep computation
> 2014-10-23 13:42:11,834 INFO org.apache.giraph.comm.netty.NettyClient: waitAllRequests: Finished all requests. MBytes/sec sent = 0.0119, MBytes/sec received = 0.0062, MBytesSent = 0, MBytesReceived = 0, ave sent req MBytes = 0, ave received req MBytes = 0, secs waited = 0.002
> 2014-10-23 13:42:11,834 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Superstep 2, messages = 0 Memory (free/total/max) = 1105.09M / 1358.25M / 9344.00M
> 2014-10-23 13:42:11,869 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: (waiting for rest of workers) WORKER_ONLY - Attempt=0, Superstep=2
> 2014-10-23 13:42:11,887 INFO org.apache.giraph.bsp.BspService: process: superstepFinished signaled
> 2014-10-23 13:42:11,895 INFO org.apache.giraph.worker.BspServiceWorker: finishSuperstep: Completed superstep 2 with global stats (vtx=538312,finVtx=0,edges=35261,msgCount=35261,haltComputation=true)
> 2014-10-23 13:42:11,895 INFO org.apache.giraph.graph.GraphTaskManager: execute: BSP application done (global vertices marked done)
> 2014-10-23 13:42:11,896 INFO org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for WORKER_ONLY
> 2014-10-23 13:42:11,903 INFO org.apache.giraph.comm.netty.NettyClient: stop: reached wait threshold, 8 connections closed, releasing NettyClient.bootstrap resources now.
> 2014-10-23 13:42:11,905 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices: Starting to save 66998 vertices using 1 threads
> 2014-10-23 13:42:11,987 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201410130927_0282/_applicationAttemptsDir/0/_superstepDir/1/_addressesAndPartitions, type=NodeDeleted, state=SyncConnected)
> 2014-10-23 13:42:11,994 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition: writing partition vertices 56 to /user/bmacek/_giraph/partitions/job_201410130927_0282/partition-56_vertices
> 2014-10-23 13:42:12,003 INFO org.apache.giraph.worker.BspServiceWorker: processEvent : partitionExchangeChildrenChanged (at least one worker is done sending partitions)
> 2014-10-23 13:42:12,128 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed event (path=/_hadoopBsp/job_201410130927_0282/_applicationAttemptsDir/0/_superstepDir/1/_superstepFinished, type=NodeDeleted, state=SyncConnected)
> 2014-10-23 13:42:12,229 INFO org.apache.giraph.worker.BspServiceWorker: processEvent: Job state changed, checking to see if it needs to restart
> 2014-10-23 13:42:12,245 INFO org.apache.giraph.bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/job_201410130927_0282/_masterJobState)
> 2014-10-23 13:43:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728
> 2014-10-23 13:43:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@25b43d0d
> 2014-10-23 13:44:11,907 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728
> 2014-10-23 13:44:11,908 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@25b43d0d
> 2014-10-23 13:45:11,908 INFO org.apache.giraph.utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@3c9c7728
> 
> 
> … this continues forever. 
> 
> 
> 
> Is there some patch i can use to fix the issue or do i have to work on the current trunk? In case i have to use the most recent sources: how are the new interfaces (abstract classes) called which i need to implement (extend)? 
> 
> 
> 
> Thanks for your help in advance,
> Frank
>