You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Foad Lotfifar <fo...@gmail.com> on 2015/12/14 12:14:42 UTC

Scalability issue on Giraph

Hi,

I have a scalability issue for Giraph and I can not find out where is 
the problem.

--- Cluster specs:
# nodes             1
# threads          32
Processor          Intel Xeon 2.0GHz
OS                    ubuntu 32bit
RAM                 64GB

--- Giraph specs
Hadoop            Apache Hadoop 1.2.1
Giraph              1.2.0 Snapshot

Tested Graphs:
amazon0302                V=262,111, E=1,234,877
coAuthorsCiteseer        V=227,320, E=1,628,268


I run the provided PageRank algorithm in Giraph 
"SimplePageRankComputation" with the followng options

(time ($HADOOP_HOME/bin/hadoop jar 
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar 
\
  org.apache.giraph.GiraphRunner 
-Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory 
\
  org.apache.giraph.examples.PageRankComputation  \
-vif 
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/hduser/input/$file \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/hduser/output/pagerank -w $2 \
-mc 
org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute)) 
2>&1 \
| tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt

The algorithm runs without any issue. The number of supersteps is set to 
31 by default in the algorithm.

*Problem:*
I dont get any scalability for more than 8 (or 16) processor cores that 
is I get speedup up to 8 (or 16) cores and then the run time starts to 
increase.

I have run the PageRank with only one superstep as well as running other 
algorithms such as ShortestPath algorithm. I get the same results. I can 
not figure out where is the problem.

1- I have tried two options by changing the giraph.numInputThreads and 
giraph.numOutputThreads: the performance gets a littile bit better but 
no impact on scalability.
2- Does it related to the size of the graphs? because the graphs I am 
testing are small graphs.
3- Is it a platform related issue?

It is the timing details of amazon graph:

# Processor cores
	1 	2 	4 	8 	16 	24 	32


Input 	
	3260 	3447 	3269 	3921 	4555 	4766
Intialise 	
	3467 	36458 	45474 	39091 	100281 	79012
Setup 	
	34 	52 	59 	70 	77 	86
Shutdown 	
	9954 	10226 	11021 	9524 	13393 	15930
Total 	
	135482 	84483 	61081 	52190 	58921 	61898


HDFS READ 	
	21097485 	26117723 	36158199 	57808783 	80086015 	102163071
FILE WRITE 	
	65889 	109815 	197667 	373429 	549165 	724901
HDFS WRITE 	
	7330986 	7331068 	7331093 	7330988 	7330976 	7331203



Best Regards,
Karos




Re: Scalability issue on Giraph

Posted by Hassan Eslami <hs...@gmail.com>.
Depending on the paper, some consider and some don't. Although, a fair
measurement would consider the IO time as well. I just asked you not to
consider the IO time, so to diagnose the problem easier, and isolate the
time and only consider actual core computation and memory accesses.

As a side note, I assumer the "-w $2" in your command translates to "-w 1",
as you are running on a single machine. And, I assume you change the number
of compute threads by giraph.numComputeThreads. Also, I assume that you are
using the trunk version where GIRAPH-1035 is already applied. You can also
play with the number of partitions (using giraph.userPartitionCount) and
set it to a multiple of the number of physical cores. For instance if your
processor has 16 cores, try 16*4=128 partitions to take advantage of core
oversubscription. If you already set all these options in your execution, I
personally can't think of any reason to limited scalability other than
small input size (which is quite likely assuming you have 128 partitions,
each partition will have only around 4K vertices), or memory bandwidth.

Hassan


On Mon, Dec 14, 2015 at 3:12 PM, Karos Lotfifar <fo...@gmail.com> wrote:

> Thanks Hasan,
>
> My question is: is it the way others report superstep times in research
> papers such as Hama, Giraph, GraphLab, etc. Are they not considering the
> I/O time? (As they report good scalable reports for Pagerank calculation).
>
> The other case is that, even taking into account the computational
> supersteps excluding I/O overhead does not make any improve. Taking into
> account the first superstep (superstep 0 is not considered as you say) the
> timing details for amazon would be:
>
> # Processor cores
>         2           4          8         16
> 24
> 32superstep 1                5101     3642     3163    2867     3611
> 3435 (ms)
>
>
> As you see, this is the same story! while no I/O is considered.
>
>
>
> Regards,
> Karos
>
>
> On Mon, Dec 14, 2015 at 9:29 PM, Hassan Eslami <hs...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Here is my take on it:
>> It may be a good idea to isolate the time spent in compute supersteps. In
>> order to do that, you can look at per-superstep timing metrics and
>> aggregate the times for all supersteps except input (-1) and output (last)
>> superstep. This eliminates the time for IO and turns your focus only on
>> time spent accessing memory and doing computation in cores. In general, if
>> an application is memory-bound (for instance PageRank), increasing the
>> number of threads does not necessarily decrease the time after a certain
>> point, due to the fact that memory bandwidth can become a bottleneck. In
>> other words, once the memory bandwidth has been saturated by a certain
>> number of threads, increasing the number of threads will not decrease the
>> execution time anymore.
>>
>> Best,
>> Hassan
>>
>> On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar <fo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a scalability issue for Giraph and I can not find out where is
>>> the problem.
>>>
>>> --- Cluster specs:
>>> # nodes             1
>>> # threads          32
>>> Processor          Intel Xeon 2.0GHz
>>> OS                    ubuntu 32bit
>>> RAM                 64GB
>>>
>>> --- Giraph specs
>>> Hadoop            Apache Hadoop 1.2.1
>>> Giraph              1.2.0 Snapshot
>>>
>>> Tested Graphs:
>>> amazon0302                V=262,111, E=1,234,877
>>> coAuthorsCiteseer        V=227,320, E=1,628,268
>>>
>>>
>>> I run the provided PageRank algorithm in Giraph
>>> "SimplePageRankComputation" with the followng options
>>>
>>> (time ($HADOOP_HOME/bin/hadoop jar
>>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>>> \
>>>  org.apache.giraph.GiraphRunner
>>> -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory
>>> \
>>>  org.apache.giraph.examples.PageRankComputation  \
>>> -vif
>>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
>>> -vip /user/hduser/input/$file \
>>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>>> -op /user/hduser/output/pagerank -w $2 \
>>> -mc
>>> org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute))
>>> 2>&1 \
>>> | tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt
>>>
>>> The algorithm runs without any issue. The number of supersteps is set to
>>> 31 by default in the algorithm.
>>>
>>> *Problem:*
>>> I dont get any scalability for more than 8 (or 16) processor cores that
>>> is I get speedup up to 8 (or 16) cores and then the run time starts to
>>> increase.
>>>
>>> I have run the PageRank with only one superstep as well as running other
>>> algorithms such as ShortestPath algorithm. I get the same results. I can
>>> not figure out where is the problem.
>>>
>>> 1- I have tried two options by changing the giraph.numInputThreads and
>>> giraph.numOutputThreads: the performance gets a littile bit better but no
>>> impact on scalability.
>>> 2- Does it related to the size of the graphs? because the graphs I am
>>> testing are small graphs.
>>> 3- Is it a platform related issue?
>>>
>>> It is the timing details of amazon graph:
>>>
>>> # Processor cores
>>> 1 2 4 8 16 24 32
>>> Input
>>> 3260 3447 3269 3921 4555 4766 Intialise
>>> 3467 36458 45474 39091 100281 79012 Setup
>>> 34 52 59 70 77 86 Shutdown
>>> 9954 10226 11021 9524 13393 15930 Total
>>> 135482 84483 61081 52190 58921 61898
>>> HDFS READ
>>> 21097485 26117723 36158199 57808783 80086015 102163071 FILE WRITE
>>> 65889 109815 197667 373429 549165 724901 HDFS WRITE
>>> 7330986 7331068 7331093 7330988 7330976 7331203
>>>
>>> Best Regards,
>>> Karos
>>>
>>>
>>>
>>>
>>
>
>
> --
> Regards,
> Karos Lotfifar
>

Re: Scalability issue on Giraph

Posted by Karos Lotfifar <fo...@gmail.com>.
Thanks Hasan,

My question is: is it the way others report superstep times in research
papers such as Hama, Giraph, GraphLab, etc. Are they not considering the
I/O time? (As they report good scalable reports for Pagerank calculation).

The other case is that, even taking into account the computational
supersteps excluding I/O overhead does not make any improve. Taking into
account the first superstep (superstep 0 is not considered as you say) the
timing details for amazon would be:

# Processor cores
        2           4          8         16
24
32superstep 1                5101     3642     3163    2867     3611   3435
(ms)


As you see, this is the same story! while no I/O is considered.



Regards,
Karos


On Mon, Dec 14, 2015 at 9:29 PM, Hassan Eslami <hs...@gmail.com> wrote:

> Hi,
>
> Here is my take on it:
> It may be a good idea to isolate the time spent in compute supersteps. In
> order to do that, you can look at per-superstep timing metrics and
> aggregate the times for all supersteps except input (-1) and output (last)
> superstep. This eliminates the time for IO and turns your focus only on
> time spent accessing memory and doing computation in cores. In general, if
> an application is memory-bound (for instance PageRank), increasing the
> number of threads does not necessarily decrease the time after a certain
> point, due to the fact that memory bandwidth can become a bottleneck. In
> other words, once the memory bandwidth has been saturated by a certain
> number of threads, increasing the number of threads will not decrease the
> execution time anymore.
>
> Best,
> Hassan
>
> On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar <fo...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a scalability issue for Giraph and I can not find out where is the
>> problem.
>>
>> --- Cluster specs:
>> # nodes             1
>> # threads          32
>> Processor          Intel Xeon 2.0GHz
>> OS                    ubuntu 32bit
>> RAM                 64GB
>>
>> --- Giraph specs
>> Hadoop            Apache Hadoop 1.2.1
>> Giraph              1.2.0 Snapshot
>>
>> Tested Graphs:
>> amazon0302                V=262,111, E=1,234,877
>> coAuthorsCiteseer        V=227,320, E=1,628,268
>>
>>
>> I run the provided PageRank algorithm in Giraph
>> "SimplePageRankComputation" with the followng options
>>
>> (time ($HADOOP_HOME/bin/hadoop jar
>> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>> \
>>  org.apache.giraph.GiraphRunner
>> -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory
>> \
>>  org.apache.giraph.examples.PageRankComputation  \
>> -vif
>> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
>> -vip /user/hduser/input/$file \
>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>> -op /user/hduser/output/pagerank -w $2 \
>> -mc
>> org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute))
>> 2>&1 \
>> | tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt
>>
>> The algorithm runs without any issue. The number of supersteps is set to
>> 31 by default in the algorithm.
>>
>> *Problem:*
>> I dont get any scalability for more than 8 (or 16) processor cores that
>> is I get speedup up to 8 (or 16) cores and then the run time starts to
>> increase.
>>
>> I have run the PageRank with only one superstep as well as running other
>> algorithms such as ShortestPath algorithm. I get the same results. I can
>> not figure out where is the problem.
>>
>> 1- I have tried two options by changing the giraph.numInputThreads and
>> giraph.numOutputThreads: the performance gets a littile bit better but no
>> impact on scalability.
>> 2- Does it related to the size of the graphs? because the graphs I am
>> testing are small graphs.
>> 3- Is it a platform related issue?
>>
>> It is the timing details of amazon graph:
>>
>> # Processor cores
>> 1 2 4 8 16 24 32
>> Input
>> 3260 3447 3269 3921 4555 4766 Intialise
>> 3467 36458 45474 39091 100281 79012 Setup
>> 34 52 59 70 77 86 Shutdown
>> 9954 10226 11021 9524 13393 15930 Total
>> 135482 84483 61081 52190 58921 61898
>> HDFS READ
>> 21097485 26117723 36158199 57808783 80086015 102163071 FILE WRITE
>> 65889 109815 197667 373429 549165 724901 HDFS WRITE
>> 7330986 7331068 7331093 7330988 7330976 7331203
>>
>> Best Regards,
>> Karos
>>
>>
>>
>>
>


-- 
Regards,
Karos Lotfifar

Re: Scalability issue on Giraph

Posted by Hassan Eslami <hs...@gmail.com>.
Hi,

Here is my take on it:
It may be a good idea to isolate the time spent in compute supersteps. In
order to do that, you can look at per-superstep timing metrics and
aggregate the times for all supersteps except input (-1) and output (last)
superstep. This eliminates the time for IO and turns your focus only on
time spent accessing memory and doing computation in cores. In general, if
an application is memory-bound (for instance PageRank), increasing the
number of threads does not necessarily decrease the time after a certain
point, due to the fact that memory bandwidth can become a bottleneck. In
other words, once the memory bandwidth has been saturated by a certain
number of threads, increasing the number of threads will not decrease the
execution time anymore.

Best,
Hassan

On Mon, Dec 14, 2015 at 5:14 AM, Foad Lotfifar <fo...@gmail.com> wrote:

> Hi,
>
> I have a scalability issue for Giraph and I can not find out where is the
> problem.
>
> --- Cluster specs:
> # nodes             1
> # threads          32
> Processor          Intel Xeon 2.0GHz
> OS                    ubuntu 32bit
> RAM                 64GB
>
> --- Giraph specs
> Hadoop            Apache Hadoop 1.2.1
> Giraph              1.2.0 Snapshot
>
> Tested Graphs:
> amazon0302                V=262,111, E=1,234,877
> coAuthorsCiteseer        V=227,320, E=1,628,268
>
>
> I run the provided PageRank algorithm in Giraph
> "SimplePageRankComputation" with the followng options
>
> (time ($HADOOP_HOME/bin/hadoop jar
> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
> \
>  org.apache.giraph.GiraphRunner
> -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.HashRangePartitionerFactory
> \
>  org.apache.giraph.examples.PageRankComputation  \
> -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
> -vip /user/hduser/input/$file \
> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
> -op /user/hduser/output/pagerank -w $2 \
> -mc
> org.apache.giraph.examples.PageRankComputation\$PageRankMasterCompute))
> 2>&1 \
> | tee -a ./pagerank_results/$file.GHR_$2.$iter.output.txt
>
> The algorithm runs without any issue. The number of supersteps is set to
> 31 by default in the algorithm.
>
> *Problem:*
> I dont get any scalability for more than 8 (or 16) processor cores that is
> I get speedup up to 8 (or 16) cores and then the run time starts to
> increase.
>
> I have run the PageRank with only one superstep as well as running other
> algorithms such as ShortestPath algorithm. I get the same results. I can
> not figure out where is the problem.
>
> 1- I have tried two options by changing the giraph.numInputThreads and
> giraph.numOutputThreads: the performance gets a littile bit better but no
> impact on scalability.
> 2- Does it related to the size of the graphs? because the graphs I am
> testing are small graphs.
> 3- Is it a platform related issue?
>
> It is the timing details of amazon graph:
>
> # Processor cores
> 1 2 4 8 16 24 32
> Input
> 3260 3447 3269 3921 4555 4766 Intialise
> 3467 36458 45474 39091 100281 79012 Setup
> 34 52 59 70 77 86 Shutdown
> 9954 10226 11021 9524 13393 15930 Total
> 135482 84483 61081 52190 58921 61898
> HDFS READ
> 21097485 26117723 36158199 57808783 80086015 102163071 FILE WRITE
> 65889 109815 197667 373429 549165 724901 HDFS WRITE
> 7330986 7331068 7331093 7330988 7330976 7331203
>
> Best Regards,
> Karos
>
>
>
>