You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Vitaly Tsvetkoff <vi...@gmail.com> on 2015/07/20 15:01:13 UTC

Problem with big datasets on cloudera yarn cluster

Hello everyone 1more time!
I am a newbie in hadoop and giraph and I wrote my custom giraph algorithm
CustomWeighredPageRank, one of the PageRank modifications and
CustomInputFormat for it (i put it in giraph-examples jar). I successfully
run it on cloudera yarn cluster (4 machine, each one has 6 cores and 12
threads) with small datasets (examples, and ~8 millions of vertices), but
all the time  problem with big datasets (~=10 millions of vertices) occurs.

The console runner is
hadoop jar
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
\
 org.apache.giraph.GiraphRunner \
 -Dgiraph.yarn.task.heap.mb=4096 \
 -Dgiraph.isStaticGraph=true \
 -Dgiraph.useOutOfCoreGraph=true \
 -Dgiraph.useOutOfCoreMessages=true \
 -Dgiraph.numInputThreads=12 \
 -Dgiraph.numComputeThreads=12 \
 -Dgiraph.weightedPageRank.superstepCount=30 \
 ru.custom.CustomWeightedPageRankComputation \
 -vif ru.custom.CustomInputFormat \
 -vip /tmp/giraph_input \
 -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
 -op /tmp/giraph \
 -w 12 \
 -yj
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar

Please see the container logs here http://pastebin.com/3pTiYbZR and main
log http://pastebin.com/YZQfz4uq .
It seems calculations start ok but during one of supersteps it crashes.
Maybe I use "bad" properties? Should -w property be equals to machine or
thread counts?

I hope anybody here could help me to solve this problem!