You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "ramesh krishnan m (JIRA)" <ji...@apache.org> on 2016/05/14 20:20:12 UTC

[jira] [Commented] (GIRAPH-462) Multithreading breaks out-of-core graph

    [ https://issues.apache.org/jira/browse/GIRAPH-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15283660#comment-15283660 ] 

ramesh krishnan m commented on GIRAPH-462:
------------------------------------------

is this issue fixed. I am still getting this erron in the latest release .

Exception logs:

2016-05-14 19:10:55,733 ERROR [ooc-io-0] org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
java.lang.RuntimeException: java.io.EOFException
	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76)
	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47)
	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286)
	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329)
	at org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195)
	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360)
	at org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64)
	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72)
	... 6 more
2016-05-14 19:10:55,737 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallableFactory: afterExecute: an out-of-core thread terminated unexpectedly with java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException
2016-05-14 19:10:55,739 INFO [checkpoint-vertices-7] org.apache.giraph.ooc.FixedOutOfCoreEngine: getNextPartition: waiting until a partition becomes available!
2016-05-14 19:10:56,426 ERROR [checkpoint-vertices-6] org.apache.giraph.utils.LogStacktraceCallable: Execution of callable failed
java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO thread
	at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
	at org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1398)
	at org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1392)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

> Multithreading breaks out-of-core graph
> ---------------------------------------
>
>                 Key: GIRAPH-462
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-462
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Priority: Critical
>         Attachments: GIRAPH-461.patch
>
>
> [~cmartella] pointed out this issue: when using multithreaded computation in conjunction with out-of-core graph, we incur in a race condition. The compute threads share the same DiskBackedPartitionStore, whose getPartition() method is not meant to be thread-safe. When two threads request two out-of-core partitions concurrently, they both try to load it to the same slot.
> The result is that we can lose the reference to one of the two partitions (which will not be written back to disk) and we can incur in a NullPointerException when both threads are trying to offload the currently loaded partition to disk.
> I ran this test to confirm the issue:
> https://gist.github.com/4429628
> All tests pass except the one that uses both out-of-core graph and multiple compute threads.
> The error is the following:
> https://gist.github.com/4429650



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)