You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Vitaly Tsvetkoff <vi...@gmail.com> on 2015/07/20 15:01:13 UTC
Problem with big datasets on cloudera yarn cluster
Hello everyone 1more time!
I am a newbie in hadoop and giraph and I wrote my custom giraph algorithm
CustomWeighredPageRank, one of the PageRank modifications and
CustomInputFormat for it (i put it in giraph-examples jar). I successfully
run it on cloudera yarn cluster (4 machine, each one has 6 cores and 12
threads) with small datasets (examples, and ~8 millions of vertices), but
all the time problem with big datasets (~=10 millions of vertices) occurs.
The console runner is
hadoop jar
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
\
org.apache.giraph.GiraphRunner \
-Dgiraph.yarn.task.heap.mb=4096 \
-Dgiraph.isStaticGraph=true \
-Dgiraph.useOutOfCoreGraph=true \
-Dgiraph.useOutOfCoreMessages=true \
-Dgiraph.numInputThreads=12 \
-Dgiraph.numComputeThreads=12 \
-Dgiraph.weightedPageRank.superstepCount=30 \
ru.custom.CustomWeightedPageRankComputation \
-vif ru.custom.CustomInputFormat \
-vip /tmp/giraph_input \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /tmp/giraph \
-w 12 \
-yj
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
Please see the container logs here http://pastebin.com/3pTiYbZR and main
log http://pastebin.com/YZQfz4uq .
It seems calculations start ok but during one of supersteps it crashes.
Maybe I use "bad" properties? Should -w property be equals to machine or
thread counts?
I hope anybody here could help me to solve this problem!