You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Darshan Mallenahalli Shankaralingappa <ds...@ntent.com> on 2017/07/16 20:54:10 UTC

Problems with running page rank using OutOfCore setting

Hi,

I am trying to run the page rank algorithm using giraph on a 3.5 billion node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB RAM total).
I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true and the application killed after some time.

I am running the giraph job using this command:
 yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880 -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true -Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.PageRankComputation -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip /user/darshan/AdjList/ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.RandomWalkVertexMasterCompute -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21

Here is a log from the zookeeper:

2017-07-12 08:08:35,026 WARN [netty-client-worker-1] org.apache.giraph.comm.netty.handler.ResponseClientHandler: exceptionCaught: Channel failed with remote address <url>/<ip>:30006<http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006>

java.lang.ArrayIndexOutOfBoundsException: 1075052547
        at org.apache.giraph.comm.flow_control.NoOpFlowControl.getAckSignalFlag(NoOpFlowControl.java:52)
        at org.apache.giraph.comm.netty.NettyClient.messageReceived(NettyClient.java:796)
        at org.apache.giraph.comm.netty.handler.ResponseClientHandler.channelRead(ResponseClientHandler.java:87)
        at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)
        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153)
        at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)
        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324)
        at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(InboundByteCounter.java:74)
        at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)
        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:126)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
        at java.lang.Thread.run(Thread.java:745)


I think this issue is related to the messaging stack rather than the algorithm.
If not, can someone please help me with this or at least point me in the right direction?

Cheers,
Darshan

Re: Problems with running page rank using OutOfCore setting

Posted by Hassan Eslami <hs...@gmail.com>.
Darshan,

Please follow the discussion here
<http://mail-archives.apache.org/mod_mbox/giraph-user/201610.mbox/%3CCAEUY=V-iRV94aj_QkQzF-7tPsaDzadb8VgWVGjiWNCQxjaOBsg@mail.gmail.com%3E>
.

After that issue is resolved, you can take a look at this discussion
<http://mail-archives.apache.org/mod_mbox/giraph-user/201611.mbox/%3CCAH1LQffrK9VwJU%3Dwx7Pi-7TCyowtAKvUMy%2BrrJq5t0M3Q-UgZA%40mail.gmail.com%3E>
on some guidelines to use out-of-core feature.

Best,
Hassan

On Mon, Jul 17, 2017 at 3:44 AM, Darshan Mallenahalli Shankaralingappa <
dshankaralingappa@ntent.com> wrote:

> Hi,
>
> I added -Dgiraph.waitForPerWorkerRequests=true parameter. And I got this
> error.
>
>
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.util.ArrayList$SubList.listIterator(ArrayList.java:1095)
>         at java.util.AbstractList.listIterator(AbstractList.java:299)
>         at java.util.ArrayList$SubList.iterator(ArrayList.java:1087)
>         at java.util.AbstractCollection.toArray(AbstractCollection.
> java:180)
>         at java.util.regex.Pattern.split(Pattern.java:1241)
>         at java.util.regex.Pattern.split(Pattern.java:1273)
>         at org.apache.giraph.examples.LongDoubleNullTextInputFormat$
> LongDoubleNullDoubleVertexReader.getCurrentVertex(
> LongDoubleNullTextInputFormat.java:86)
>         at org.apache.giraph.io.internal.WrappedVertexReader.
> getCurrentVertex(WrappedVertexReader.java:90)
>         at org.apache.giraph.worker.VertexInputSplitsCallable.
> readInputSplit(VertexInputSplitsCallable.java:182)
>         at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(
> InputSplitsCallable.java:275)
>         at org.apache.giraph.worker.InputSplitsCallable.call(
> InputSplitsCallable.java:227)
>         at org.apache.giraph.worker.InputSplitsCallable.call(
> InputSplitsCallable.java:60)
>         at org.apache.giraph.utils.LogStacktraceCallable.call(
> LogStacktraceCallable.java:67)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> So, does this mean that there is no other solution but to increase the
> physical memory?
>
>
> Cheers,
>
> Darshan
>
>
> On 16 Jul 2017, at 23:04, Hassan Eslami <hsn.eslami@gmail.com<mailto:h
> sn.eslami@gmail.com>> wrote:
>
> Hi,
>
> giraph.useOutOfCoreMessages is no longer in use.
>
> The main problem here is that you are using default flow control mechanism
> (NoOpFlowControl), that causes a lot of outstanding/received messages. As a
> consequence, you fill up the memory so fast, and the job would fail for
> various reasons. Please use the following options instead:
>
> -Dgiraph.isStaticGraph=false -Dgiraph.useOutOfCoreGraph=true
> -Dgiraph.waitForPerWorkerRequests=true
>
> Note: the static graph has a known bug with the out-of-core mechanism.
>
> Hope it helps,
> Hassan
>
> On Sun, Jul 16, 2017 at 1:54 PM, Darshan Mallenahalli Shankaralingappa <
> dshankaralingappa@ntent.com<ma...@ntent.com>> wrote:
>
> Hi,
>
> I am trying to run the page rank algorithm using giraph on a 3.5 billion
> node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB
> RAM total).
> I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true
> and the application killed after some time.
>
> I am running the giraph job using this command:
> yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880
> -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true
> -Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.
> PageRankComputation
> -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip
> /user/darshan/AdjList/ -vof org.apache.giraph.io.formats.
> IdWithValueTextOutputFormat
> -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.
> RandomWalkVertexMasterCompute
> -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca
> org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f
> -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21
>
> Here is a log from the zookeeper:
>
> 2017-07-12 08:08:35,026 WARN [netty-client-worker-1]
> org.apache.giraph.comm.netty.handler.ResponseClientHandler:
> exceptionCaught: Channel failed with remote address <url>/<ip>:30006<
> http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006>
>
> java.lang.ArrayIndexOutOfBoundsException: 1075052547
>        at org.apache.giraph.comm.flow_control.NoOpFlowControl.
> getAckSignalFlag(NoOpFlowControl.java:52)
>        at org.apache.giraph.comm.netty.NettyClient.messageReceived(
> NettyClient.java:796)
>        at org.apache.giraph.comm.netty.handler.ResponseClientHandler.
> channelRead(ResponseClientHandler.java:87)
>        at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:153)
>        at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>        at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(
> InboundByteCounter.java:74)
>        at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>        at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:785)
>        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
> AbstractNioByteChannel.java:126)
>        at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:485)
>        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:452)
>        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
>        at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:101)
>        at java.lang.Thread.run(Thread.java:745)
>
>
> I think this issue is related to the messaging stack rather than the
> algorithm.
> If not, can someone please help me with this or at least point me in the
> right direction?
>
> Cheers,
> Darshan
>
>

Re: Problems with running page rank using OutOfCore setting

Posted by Darshan Mallenahalli Shankaralingappa <ds...@ntent.com>.
Hi,

I added -Dgiraph.waitForPerWorkerRequests=true parameter. And I got this error.


Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.ArrayList$SubList.listIterator(ArrayList.java:1095)
        at java.util.AbstractList.listIterator(AbstractList.java:299)
        at java.util.ArrayList$SubList.iterator(ArrayList.java:1087)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:180)
        at java.util.regex.Pattern.split(Pattern.java:1241)
        at java.util.regex.Pattern.split(Pattern.java:1273)
        at org.apache.giraph.examples.LongDoubleNullTextInputFormat$LongDoubleNullDoubleVertexReader.getCurrentVertex(LongDoubleNullTextInputFormat.java:86)
        at org.apache.giraph.io.internal.WrappedVertexReader.getCurrentVertex(WrappedVertexReader.java:90)
        at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:182)
        at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:275)
        at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:227)
        at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
        at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



So, does this mean that there is no other solution but to increase the physical memory?


Cheers,

Darshan


On 16 Jul 2017, at 23:04, Hassan Eslami <hs...@gmail.com>> wrote:

Hi,

giraph.useOutOfCoreMessages is no longer in use.

The main problem here is that you are using default flow control mechanism
(NoOpFlowControl), that causes a lot of outstanding/received messages. As a
consequence, you fill up the memory so fast, and the job would fail for
various reasons. Please use the following options instead:

-Dgiraph.isStaticGraph=false -Dgiraph.useOutOfCoreGraph=true
-Dgiraph.waitForPerWorkerRequests=true

Note: the static graph has a known bug with the out-of-core mechanism.

Hope it helps,
Hassan

On Sun, Jul 16, 2017 at 1:54 PM, Darshan Mallenahalli Shankaralingappa <
dshankaralingappa@ntent.com<ma...@ntent.com>> wrote:

Hi,

I am trying to run the page rank algorithm using giraph on a 3.5 billion
node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB
RAM total).
I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true
and the application killed after some time.

I am running the giraph job using this command:
yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880
-Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true
-Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.PageRankComputation
-vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip
/user/darshan/AdjList/ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.RandomWalkVertexMasterCompute
-wc org.apache.giraph.examples.RandomWalkWorkerContext -ca
org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f
-ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21

Here is a log from the zookeeper:

2017-07-12 08:08:35,026 WARN [netty-client-worker-1]
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address <url>/<ip>:30006<
http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006>

java.lang.ArrayIndexOutOfBoundsException: 1075052547
       at org.apache.giraph.comm.flow_control.NoOpFlowControl.
getAckSignalFlag(NoOpFlowControl.java:52)
       at org.apache.giraph.comm.netty.NettyClient.messageReceived(
NettyClient.java:796)
       at org.apache.giraph.comm.netty.handler.ResponseClientHandler.
channelRead(ResponseClientHandler.java:87)
       at io.netty.channel.DefaultChannelHandlerContext.
invokeChannelRead(DefaultChannelHandlerContext.java:338)
       at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
DefaultChannelHandlerContext.java:324)
       at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
ByteToMessageDecoder.java:153)
       at io.netty.channel.DefaultChannelHandlerContext.
invokeChannelRead(DefaultChannelHandlerContext.java:338)
       at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
DefaultChannelHandlerContext.java:324)
       at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(
InboundByteCounter.java:74)
       at io.netty.channel.DefaultChannelHandlerContext.
invokeChannelRead(DefaultChannelHandlerContext.java:338)
       at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
DefaultChannelHandlerContext.java:324)
       at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
DefaultChannelPipeline.java:785)
       at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
AbstractNioByteChannel.java:126)
       at io.netty.channel.nio.NioEventLoop.processSelectedKey(
NioEventLoop.java:485)
       at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
NioEventLoop.java:452)
       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
       at io.netty.util.concurrent.SingleThreadEventExecutor$2.
run(SingleThreadEventExecutor.java:101)
       at java.lang.Thread.run(Thread.java:745)


I think this issue is related to the messaging stack rather than the
algorithm.
If not, can someone please help me with this or at least point me in the
right direction?

Cheers,
Darshan


Re: Problems with running page rank using OutOfCore setting

Posted by Hassan Eslami <hs...@gmail.com>.
Hi,

giraph.useOutOfCoreMessages is no longer in use.

The main problem here is that you are using default flow control mechanism
(NoOpFlowControl), that causes a lot of outstanding/received messages. As a
consequence, you fill up the memory so fast, and the job would fail for
various reasons. Please use the following options instead:

 -Dgiraph.isStaticGraph=false -Dgiraph.useOutOfCoreGraph=true
-Dgiraph.waitForPerWorkerRequests=true

Note: the static graph has a known bug with the out-of-core mechanism.

Hope it helps,
Hassan

On Sun, Jul 16, 2017 at 1:54 PM, Darshan Mallenahalli Shankaralingappa <
dshankaralingappa@ntent.com> wrote:

> Hi,
>
> I am trying to run the page rank algorithm using giraph on a 3.5 billion
> node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB
> RAM total).
> I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true
> and the application killed after some time.
>
> I am running the giraph job using this command:
>  yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880
> -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true
> -Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.PageRankComputation
> -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip
> /user/darshan/AdjList/ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
> -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.RandomWalkVertexMasterCompute
> -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca
> org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f
> -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21
>
> Here is a log from the zookeeper:
>
> 2017-07-12 08:08:35,026 WARN [netty-client-worker-1]
> org.apache.giraph.comm.netty.handler.ResponseClientHandler:
> exceptionCaught: Channel failed with remote address <url>/<ip>:30006<
> http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006>
>
> java.lang.ArrayIndexOutOfBoundsException: 1075052547
>         at org.apache.giraph.comm.flow_control.NoOpFlowControl.
> getAckSignalFlag(NoOpFlowControl.java:52)
>         at org.apache.giraph.comm.netty.NettyClient.messageReceived(
> NettyClient.java:796)
>         at org.apache.giraph.comm.netty.handler.ResponseClientHandler.
> channelRead(ResponseClientHandler.java:87)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:153)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(
> InboundByteCounter.java:74)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:785)
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
> AbstractNioByteChannel.java:126)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:485)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:452)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:101)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> I think this issue is related to the messaging stack rather than the
> algorithm.
> If not, can someone please help me with this or at least point me in the
> right direction?
>
> Cheers,
> Darshan
>