You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Cristina Bovi <cr...@gmail.com> on 2018/12/08 19:17:21 UTC

4-profiles calculus on big graphs

Hi, for my master thesis in computer science I succeed in implementing
4-profiles calculus (https://arxiv.org/abs/1510.02215 -
http://eelenberg.github.io/Elenberg4profileWWW16.pdf) using
giraph-1.3.0-snapshot (compiled with -Phadoop_yarn profile) and
hadoop-2.8.4.

I configured a cluster on amazon ec2 composed of 1 namenode and 5 datanodes
using t2.2xlarge (32GB, 8CPU) instances and I obtained results described in
attached file (available also here https://we.tl/t-7DuNJSSuN3) with input
graphs of small/medium dimensions.

If I try to give in input to my giraph program bigger input graphs (e.g.
like http://snap.stanford.edu/data/web-NotreDame.html) in some cases I
obtain many errors related to netty and the yarn application FAILS, in
other cases the yarn application remains in a RUNNING UNDEFINED state (then
I killed it instead of waiting the default timeout) without apparently no
error. I also tried to use m5.4xlarge (64GB, 16CPU) but I obtained same
problems. I reported log errors of first case here:

- log of errors by giraph worker on datanodes pasted here:
https://pastebin.com/CGHUd0za (same errors in all datanodes)
- log of errors by giraph master pasted here: https://pastebin.com/JXYN6y4L

I'm quite sure that errors are not related to insufficient memory of EC2
instances because in the log I always saw messages like "(free/total/max) =
23038.28M / 27232.00M / 27232.00M". *Please help me because my master
thesis is blocked with this problem :-(*

This is an example of command that I used to run giraph, could you please
check if parameters that I used are correct? Any other tuning will be
appreciated!

giraph 4Profiles-0.0.1-SNAPSHOT.jar
it.uniroma1.di.fourprofiles.worker.superstep0.gas1.Worker_Superstep0_GAS1
-ca giraph.numComputeThreads=8 // Since t2.2xlarge has 8 CORES, is it
correct to set these parameters to 8?
-ca giraph.numInputThreads=8
-ca giraph.numOutputThreads=8

-w 8 // I set 8 workers since:
     //    - 5 datanodes on EC2"
     //    - every datanode configured for max 2 containers in order to
reduce messages between datanodes
     //    - 2 containers are reserved for application master and giraph
master
     //    - (5 datanodes * 2 max containers) - 2 reserved = 8 workers
     // Is it correct as reasoning?

-yh 15360 // I set 15360 since it corresponds to
          // - yarn.scheduler.minimum-allocation-mb property in
yarn-site.xml
          // - mapreduce.map.memory.mb property in mapred-site.xml
          // Is it correct as reasoning?

-ca giraph.pure.yarn.job=true
-mc it.uniroma1.di.fourprofiles.master.Master_FourProfiles
-ca io.edge.reverse.duplicator=true
-eif
it.uniroma1.di.fourprofiles.io.format.IntEdgeData_TextEdgeInputFormat_ReverseEdgeDuplicator

-eip INPUT_GRAPHS/HU_edges.txt-processed
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op output
-ca giraph.SplitMasterWorker=true
-ca
giraph.messageCombinerClass=it.uniroma1.di.fourprofiles.worker.msgcombiner.Worker_MsgCombiner

-ca
giraph.master.observers=it.uniroma1.di.fourprofiles.master.observer.Observer_FourProfiles

-ca giraph.metrics.enable=true
-ca giraph.useInputSplitLocality=false
-ca giraph.useBigDataIOForMessages=true
-ca giraph.useMessageSizeEncoding=true
-ca giraph.oneToAllMsgSending=true
-ca giraph.isStaticGraph=true

Furthermore I tried to use following netty parameters but I didn't resolve
the problems. Could you please help me if I miss some important parameter
or maybe I used it in a wrong way? I tried to generalize the value passed
to netty parameters with a trivial formula nettyFactor*defaultValue where
nettyFactor can be 1, 2, 3, ... (passed by shell parameter)

-ca giraph.nettyAutoRead=true
-ca giraph.channelsPerServer=$((nettyFactor*1))
-ca giraph.nettyClientThreads=$((nettyFactor*4))
-ca giraph.nettyClientExecutionThreads=$((nettyFactor*8))
-ca giraph.nettyServerThreads=$((nettyFactor*16))
-ca giraph.nettyServerExecutionThreads=$((nettyFactor*8))
-ca giraph.clientSendBufferSize=$((nettyFactor*524288))
-ca giraph.clientReceiveBufferSize=$((nettyFactor*32768))
-ca giraph.serverSendBufferSize=$((nettyFactor*32768))
-ca giraph.serverReceiveBufferSize=$((nettyFactor*524288))
-ca giraph.vertexRequestSize=$((nettyFactor*524288))
-ca giraph.edgeRequestSize=$((nettyFactor*524288))
-ca giraph.msgRequestSize=$((nettyFactor*524288))
-ca giraph.nettyRequestEncoderBufferSize=$((nettyFactor*32768))




... I have other questions:
1)
This is my hadoop configuration https://we.tl/t-t1ItNYFe7H Please check it
but I'm quite sure that is correct. I have only a question about it: since
giraph does not use "reduce", is it correct to assing 0 MB to
mapreduce.reduce.memory.mb in mapred-site.xml?

2)
In order to avoid ClassNotFoundException error I copied the jar of my
giraph application and all giraph jars from $GIRAPH_HOME and
$GIRAPH_HOME/lib to $HADOOP_HOME/share/hadoop/yarn/lib. Is there a better
solution?

3)
Last but not least: Here https://we.tl/t-tdhuZFsVJW you can find the
completed hadoop/yarn log of my giraph program with following graph
http://snap.stanford.edu/data/web-NotreDame.html as input. In this case the
yarn application reamins in RUNNING UNDEFINED state.


Thanks
-- 
Cristina Bovi

Re: 4-profiles calculus on big graphs

Posted by Eli Reisman <ap...@gmail.com>.
Hi Cristina,

First of all, the YARN support for Giraph is not well maintained right now,
so it's going to be rough around the edges. Thanks for your detailed post,
there's lots of good info there. Some off the top of my head ideas:

- I think the buffers you're setting (esp. netty level but possibly also
Giraph buffers) are probably bit big for the cluster you're running
- You can upload app-level JARs and dependencies to the YARN cache rather
than put them in with the Hadoop lib jars. There's a command line arg to
specify the local copies for upload to the cache at the command line when
you run your job
- From your logs, it looks like you're losing a node sometime during
superstep 2 and Giraph isn't handling the failure properly? My suggestion
is try more YARN nodes, more memory, and less resources devoted to buffers
in the configs, see if you can identify anywhere you might be creating
message amplification without realizing, and consider trying a run on a
non-YARN Hadoop cluster if it's feasible

Hope that helps, good luck with your thesis work!
Eli

On Sat, Dec 8, 2018 at 11:17 AM Cristina Bovi <cr...@gmail.com> wrote:

> Hi, for my master thesis in computer science I succeed in implementing
> 4-profiles calculus (https://arxiv.org/abs/1510.02215 -
> http://eelenberg.github.io/Elenberg4profileWWW16.pdf) using
> giraph-1.3.0-snapshot (compiled with -Phadoop_yarn profile) and
> hadoop-2.8.4.
>
> I configured a cluster on amazon ec2 composed of 1 namenode and 5
> datanodes using t2.2xlarge (32GB, 8CPU) instances and I obtained results
> described in attached file (available also here https://we.tl/t-7DuNJSSuN3)
> with input graphs of small/medium dimensions.
>
> If I try to give in input to my giraph program bigger input graphs (e.g.
> like http://snap.stanford.edu/data/web-NotreDame.html) in some cases I
> obtain many errors related to netty and the yarn application FAILS, in
> other cases the yarn application remains in a RUNNING UNDEFINED state (then
> I killed it instead of waiting the default timeout) without apparently no
> error. I also tried to use m5.4xlarge (64GB, 16CPU) but I obtained same
> problems. I reported log errors of first case here:
>
> - log of errors by giraph worker on datanodes pasted here:
> https://pastebin.com/CGHUd0za (same errors in all datanodes)
> - log of errors by giraph master pasted here:
> https://pastebin.com/JXYN6y4L
>
> I'm quite sure that errors are not related to insufficient memory of EC2
> instances because in the log I always saw messages like "(free/total/max) =
> 23038.28M / 27232.00M / 27232.00M". *Please help me because my master
> thesis is blocked with this problem :-(*
>
> This is an example of command that I used to run giraph, could you please
> check if parameters that I used are correct? Any other tuning will be
> appreciated!
>
> giraph 4Profiles-0.0.1-SNAPSHOT.jar
> it.uniroma1.di.fourprofiles.worker.superstep0.gas1.Worker_Superstep0_GAS1
> -ca giraph.numComputeThreads=8 // Since t2.2xlarge has 8 CORES, is it
> correct to set these parameters to 8?
> -ca giraph.numInputThreads=8
> -ca giraph.numOutputThreads=8
>
> -w 8 // I set 8 workers since:
>      //    - 5 datanodes on EC2"
>      //    - every datanode configured for max 2 containers in order to
> reduce messages between datanodes
>      //    - 2 containers are reserved for application master and giraph
> master
>      //    - (5 datanodes * 2 max containers) - 2 reserved = 8 workers
>      // Is it correct as reasoning?
>
> -yh 15360 // I set 15360 since it corresponds to
>           // - yarn.scheduler.minimum-allocation-mb property in
> yarn-site.xml
>           // - mapreduce.map.memory.mb property in mapred-site.xml
>           // Is it correct as reasoning?
>
> -ca giraph.pure.yarn.job=true
> -mc it.uniroma1.di.fourprofiles.master.Master_FourProfiles
> -ca io.edge.reverse.duplicator=true
> -eif
> it.uniroma1.di.fourprofiles.io.format.IntEdgeData_TextEdgeInputFormat_ReverseEdgeDuplicator
>
> -eip INPUT_GRAPHS/HU_edges.txt-processed
> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
> -op output
> -ca giraph.SplitMasterWorker=true
> -ca
> giraph.messageCombinerClass=it.uniroma1.di.fourprofiles.worker.msgcombiner.Worker_MsgCombiner
>
> -ca
> giraph.master.observers=it.uniroma1.di.fourprofiles.master.observer.Observer_FourProfiles
>
> -ca giraph.metrics.enable=true
> -ca giraph.useInputSplitLocality=false
> -ca giraph.useBigDataIOForMessages=true
> -ca giraph.useMessageSizeEncoding=true
> -ca giraph.oneToAllMsgSending=true
> -ca giraph.isStaticGraph=true
>
> Furthermore I tried to use following netty parameters but I didn't resolve
> the problems. Could you please help me if I miss some important parameter
> or maybe I used it in a wrong way? I tried to generalize the value passed
> to netty parameters with a trivial formula nettyFactor*defaultValue where
> nettyFactor can be 1, 2, 3, ... (passed by shell parameter)
>
> -ca giraph.nettyAutoRead=true
> -ca giraph.channelsPerServer=$((nettyFactor*1))
> -ca giraph.nettyClientThreads=$((nettyFactor*4))
> -ca giraph.nettyClientExecutionThreads=$((nettyFactor*8))
> -ca giraph.nettyServerThreads=$((nettyFactor*16))
> -ca giraph.nettyServerExecutionThreads=$((nettyFactor*8))
> -ca giraph.clientSendBufferSize=$((nettyFactor*524288))
> -ca giraph.clientReceiveBufferSize=$((nettyFactor*32768))
> -ca giraph.serverSendBufferSize=$((nettyFactor*32768))
> -ca giraph.serverReceiveBufferSize=$((nettyFactor*524288))
> -ca giraph.vertexRequestSize=$((nettyFactor*524288))
> -ca giraph.edgeRequestSize=$((nettyFactor*524288))
> -ca giraph.msgRequestSize=$((nettyFactor*524288))
> -ca giraph.nettyRequestEncoderBufferSize=$((nettyFactor*32768))
>
>
>
>
> ... I have other questions:
> 1)
> This is my hadoop configuration https://we.tl/t-t1ItNYFe7H Please check
> it but I'm quite sure that is correct. I have only a question about it:
> since giraph does not use "reduce", is it correct to assing 0 MB to
> mapreduce.reduce.memory.mb in mapred-site.xml?
>
> 2)
> In order to avoid ClassNotFoundException error I copied the jar of my
> giraph application and all giraph jars from $GIRAPH_HOME and
> $GIRAPH_HOME/lib to $HADOOP_HOME/share/hadoop/yarn/lib. Is there a better
> solution?
>
> 3)
> Last but not least: Here https://we.tl/t-tdhuZFsVJW you can find the
> completed hadoop/yarn log of my giraph program with following graph
> http://snap.stanford.edu/data/web-NotreDame.html as input. In this case
> the yarn application reamins in RUNNING UNDEFINED state.
>
>
> Thanks
> --
> Cristina Bovi
>