You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/06/30 19:59:10 UTC

[jira] [Commented] (SPARK-16333) Excessive Spark history event/json data size (5GB each)

    [ https://issues.apache.org/jira/browse/SPARK-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357757#comment-15357757 ] 

Sean Owen commented on SPARK-16333:
-----------------------------------

Can you comment on what the data is before and after? it may give a sense of whether this is really what it seems to be, and why so much more is being logged.

> Excessive Spark history event/json data size (5GB each)
> -------------------------------------------------------
>
>                 Key: SPARK-16333
>                 URL: https://issues.apache.org/jira/browse/SPARK-16333
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>         Environment: this is seen on both x86 (Intel(R) Xeon(R), E5-2699 ) and ppc platform (Habanero, Model: 8348-21C), Red Hat Enterprise Linux Server release 7.2 (Maipo)., Spark2.0.0-preview (May-24, 2016 build)
>            Reporter: Peter Liu
>              Labels: performance, spark2.0.0
>
> With Spark2.0.0-preview (May-24 build), the history event data (the json file), that is generated for each Spark application run (see below), can be as big as 5GB (instead of 14 MB for exactly the same application run and the same input data of 1TB under Spark1.6.1)
> -rwxrwx--- 1 root root 5.3G Jun 30 09:39 app-20160630091959-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 09:56 app-20160630094213-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 10:13 app-20160630095856-0000
> -rwxrwx--- 1 root root 5.3G Jun 30 10:30 app-20160630101556-0000
> The test is done with Sparkbench V2, SQL RDD (see github: https://github.com/SparkTC/spark-bench)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org