You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by asma zgolli <zg...@gmail.com> on 2020/02/04 11:57:35 UTC

shuffle mathematic formulat

dear spark contributors,

I'm searching for a way to model spark shuffle cost and i wonder if there s
mathematic formulas to compute "shuffle read " and "shuffle write" sizes in
the stages view in spark UI.
if there isn't, are there any references to head start in this.
Stage Id  ▾
<http://localhost:4040/stages/?&completedStage.sort=Stage+Id&completedStage.desc=false&completedStage.pageSize=100#completed>
Description
<http://localhost:4040/stages/?&completedStage.sort=Description&completedStage.pageSize=100#completed>
Submitted
<http://localhost:4040/stages/?&completedStage.sort=Submitted&completedStage.pageSize=100#completed>
Duration
<http://localhost:4040/stages/?&completedStage.sort=Duration&completedStage.pageSize=100#completed>Tasks:
Succeeded/TotalInput
<http://localhost:4040/stages/?&completedStage.sort=Input&completedStage.pageSize=100#completed>
Output
<http://localhost:4040/stages/?&completedStage.sort=Output&completedStage.pageSize=100#completed>Shuffle
Read
<http://localhost:4040/stages/?&completedStage.sort=Shuffle+Read&completedStage.pageSize=100#completed>Shuffle
Write
<http://localhost:4040/stages/?&completedStage.sort=Shuffle+Write&completedStage.pageSize=100#completed>

thank you for the help and the directions
yours sincerely
Asma ZGOLLI

Ph.D. student in data engineering - computer science

Re: shuffle mathematic formulat

Posted by Aironman DirtDiver <al...@gmail.com>.
I would have to check it, but in principle it could be done by checking the
streaming logs, so that once you detect when a shuffle operation starts and
ends, you can know the total operation time.


https://stackoverflow.com/questions/27276884/what-is-shuffle-read-shuffle-write-in-apache-spark

El mar., 4 feb. 2020 a las 12:58, asma zgolli (<zg...@gmail.com>)
escribió:

> dear spark contributors,
>
> I'm searching for a way to model spark shuffle cost and i wonder if there
> s mathematic formulas to compute "shuffle read " and "shuffle write" sizes
> in the stages view in spark UI.
> if there isn't, are there any references to head start in this.
> Stage Id  ▾
> <http://localhost:4040/stages/?&completedStage.sort=Stage+Id&completedStage.desc=false&completedStage.pageSize=100#completed>
> Description
> <http://localhost:4040/stages/?&completedStage.sort=Description&completedStage.pageSize=100#completed>
> Submitted
> <http://localhost:4040/stages/?&completedStage.sort=Submitted&completedStage.pageSize=100#completed>
> Duration
> <http://localhost:4040/stages/?&completedStage.sort=Duration&completedStage.pageSize=100#completed>Tasks:
> Succeeded/TotalInput
> <http://localhost:4040/stages/?&completedStage.sort=Input&completedStage.pageSize=100#completed>
> Output
> <http://localhost:4040/stages/?&completedStage.sort=Output&completedStage.pageSize=100#completed>Shuffle
> Read
> <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Read&completedStage.pageSize=100#completed>Shuffle
> Write
> <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Write&completedStage.pageSize=100#completed>
>
> thank you for the help and the directions
> yours sincerely
> Asma ZGOLLI
>
> Ph.D. student in data engineering - computer science
>


-- 
Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>