You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Kaepke, Marc" <ma...@haw-hamburg.de> on 2017/08/18 17:51:59 UTC

PageRank - 4x slower then Spark?!

Hi everyone,

I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink.
I hope I did a mistake. So please help me to improve the performance of my cluster and config.

The cluster has 4 computers:
One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp))
Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp))
In total 24 cores/ task slots.

I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24.
Runtime in ms:
Pregel: 90.000ms
SG: 64.000ms
GSA: 80.000ms
Bulk: 53.000ms
Spark with Pregel ran in 23.000ms

The input file was: https://snap.stanford.edu/data/wiki-topcats.html

Thanks for helping!

Marc

Fwd: PageRank - 4x slower then Spark?!

Posted by kant kodali <ka...@gmail.com>.
---------- Forwarded message ----------
From: Kaepke, Marc <ma...@haw-hamburg.de>
Date: Fri, Aug 18, 2017 at 10:51 AM
Subject: PageRank - 4x slower then Spark?!
To: "user@flink.apache.org" <us...@flink.apache.org>


Hi everyone,

I compared Flink and Spark by using PageRank. I guessed Flink will beat
Spark or have the same level. But Spark is up to 4x faster then Flink.
I hope I did a mistake. So please help me to improve the performance of my
cluster and config.

The cluster has 4 computers:
One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB
jobmanager.heap.mp))
Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (
taskmanager.heap.mp))
In total 24 cores/ task slots.

I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk
iteration. The parallelism was 24.
Runtime in ms:
Pregel: 90.000ms
SG: 64.000ms
GSA: 80.000ms
Bulk: 53.000ms
Spark with Pregel ran in 23.000ms

The input file was: https://snap.stanford.edu/data/wiki-topcats.html

Thanks for helping!

Marc

Re: PageRank - 4x slower then Spark?!

Posted by Timo Walther <tw...@apache.org>.
You could enable object reuse [0] if you application allows that. Also 
adjusting the managed memory size [1] can help.

Are you using Flink's graph library Gelly?

[0] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/index.html#object-reuse-enabled
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#managed-memory

Regards,
Timo

Am 23.08.17 um 17:11 schrieb Kaepke, Marc:
> Does someone has a current performance test based on PageRank or an idea why Flink lost the comparison?
>
>
>> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc <ma...@haw-hamburg.de>:
>>
>> Hi everyone,
>>
>> I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink.
>> I hope I did a mistake. So please help me to improve the performance of my cluster and config.
>>
>> The cluster has 4 computers:
>> One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp))
>> Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp))
>> In total 24 cores/ task slots.
>>
>> I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24.
>> Runtime in ms:
>> Pregel: 90.000ms
>> SG: 64.000ms
>> GSA: 80.000ms
>> Bulk: 53.000ms
>> Spark with Pregel ran in 23.000ms
>>
>> The input file was: https://snap.stanford.edu/data/wiki-topcats.html
>>
>> Thanks for helping!
>>
>> Marc



Re: PageRank - 4x slower then Spark?!

Posted by "Kaepke, Marc" <ma...@haw-hamburg.de>.
Does someone has a current performance test based on PageRank or an idea why Flink lost the comparison?


> Am 18.08.2017 um 19:51 schrieb Kaepke, Marc <ma...@haw-hamburg.de>:
> 
> Hi everyone,
> 
> I compared Flink and Spark by using PageRank. I guessed Flink will beat Spark or have the same level. But Spark is up to 4x faster then Flink.
> I hope I did a mistake. So please help me to improve the performance of my cluster and config.
> 
> The cluster has 4 computers:
> One JobManager (Quad Core with Hyper Threading -> 8 cores) and 16GB jobmanager.heap.mp))
> Three TaskManager (each Quad Core with Hyper Threading -> 8 cores and 16GB (taskmanager.heap.mp))
> In total 24 cores/ task slots.
> 
> I ran PR as vertex-centric, scatter-gather, gather-sum-apply and with bulk iteration. The parallelism was 24.
> Runtime in ms:
> Pregel: 90.000ms
> SG: 64.000ms
> GSA: 80.000ms
> Bulk: 53.000ms
> Spark with Pregel ran in 23.000ms
> 
> The input file was: https://snap.stanford.edu/data/wiki-topcats.html
> 
> Thanks for helping!
> 
> Marc