You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2014/11/19 03:27:34 UTC

[jira] [Issue Comment Deleted] (SPARK-4480) Avoid many small spills in external data structures

     [ https://issues.apache.org/jira/browse/SPARK-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or updated SPARK-4480:
-----------------------------
    Comment: was deleted

(was: https://github.com/apache/spark/pull/3353)

> Avoid many small spills in external data structures
> ---------------------------------------------------
>
>                 Key: SPARK-4480
>                 URL: https://issues.apache.org/jira/browse/SPARK-4480
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>
> The following output is provided by [~shenhong] in SPARK-4380.
> {code}
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292769 spills so far)
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4760 B to disk (292770 spills so far)
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4520 B to disk (292771 spills so far)
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4560 B to disk (292772 spills so far)
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292773 spills so far)
> 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4784 B to disk (292774 spills so far)
> {code}
> Spilling many small files has two implications. First, it can cause "too many open files" exceptions, as we observed in SPARK-3633. Second, it causes degradation in performance. We have seen slight performance regressions from 1.0.2 to 1.1.0, and this is likely the cause.
> Note that this is spun-off from SPARK-4452, the fixing of which involves a bigger change in the way we track shuffle memory. This issue is smaller in scope in that it only makes sure we don't constantly spill, regardless of the policy we use for tracking shuffle memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org