You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2014/09/17 21:57:33 UTC
[jira] [Created] (SPARK-3570) Shuffle write time does not include
time to open shuffle files
Kay Ousterhout created SPARK-3570:
-------------------------------------
Summary: Shuffle write time does not include time to open shuffle files
Key: SPARK-3570
URL: https://issues.apache.org/jira/browse/SPARK-3570
Project: Spark
Issue Type: Bug
Affects Versions: 1.1.0, 1.0.2, 0.9.2
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Currently, the reported shuffle write time does not include time to open the shuffle files. This time can be very significant when the disk is highly utilized and many shuffle files exist on the machine (I'm not sure how severe this is in 1.0 onward -- since shuffle files are automatically deleted, this may be less of an issue because there are fewer old files sitting around). In experiments I did, in extreme cases, adding the time to open files can increase the shuffle write time from 5ms (of a 2 second task) to 1 second. We should fix this for better performance debugging.
Thanks [~shivaram] for helping to diagnose this problem. cc [~pwendell]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org