You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2014/09/17 22:01:35 UTC

[jira] [Updated] (SPARK-3570) Shuffle write time does not include time to open shuffle files

     [ https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kay Ousterhout updated SPARK-3570:
----------------------------------
    Attachment: 3a_1410854905_0_job_log_waterfall.pdf
                3a_1410943402_0_job_log_waterfall.pdf

In case anyone is extra curious about this...here are two plots of the same job, with the fixed logging (that includes file open time) in the first job.  You can see that fixing this metric can be the difference between mysterious stragglers tasks and stragglers that are clearly due to disk activity.

> Shuffle write time does not include time to open shuffle files
> --------------------------------------------------------------
>
>                 Key: SPARK-3570
>                 URL: https://issues.apache.org/jira/browse/SPARK-3570
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 1.0.2, 1.1.0
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>         Attachments: 3a_1410854905_0_job_log_waterfall.pdf, 3a_1410943402_0_job_log_waterfall.pdf
>
>
> Currently, the reported shuffle write time does not include time to open the shuffle files.  This time can be very significant when the disk is highly utilized and many shuffle files exist on the machine (I'm not sure how severe this is in 1.0 onward -- since shuffle files are automatically deleted, this may be less of an issue because there are fewer old files sitting around).  In experiments I did, in extreme cases, adding the time to open files can increase the shuffle write time from 5ms (of a 2 second task) to 1 second.  We should fix this for better performance debugging.
> Thanks [~shivaram] for helping to diagnose this problem.  cc [~pwendell]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org