You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2018/01/05 05:43:00 UTC

[jira] [Commented] (SPARK-22805) Use aliases for StorageLevel in event logs

    [ https://issues.apache.org/jira/browse/SPARK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312499#comment-16312499 ] 

Imran Rashid commented on SPARK-22805:
--------------------------------------

I'm leaning slightly against this, though could go either way.

For 2.3+, the gains are pretty small, and it means an old history server can't read new logs (I know we don't guarantee that anyway, but might as well keep it if we can).

For < 2.3, there would be notable improvements in log sizes, but I don't like the compatibility story.  I don't think there are any explicit guarantees but seems pretty annoying to have a 2.2.1 SHS not read logs from spark 2.2.2.

sorry [~lebedev], I appreciate the work you've put into this anyhow.

> Use aliases for StorageLevel in event logs
> ------------------------------------------
>
>                 Key: SPARK-22805
>                 URL: https://issues.apache.org/jira/browse/SPARK-22805
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.2, 2.2.1
>            Reporter: Sergei Lebedev
>            Priority: Minor
>
> Fact 1: {{StorageLevel}} has a private constructor, therefore a list of predefined levels is not extendable (by the users).
> Fact 2: The format of event logs uses redundant representation for storage levels 
> {code}
> >>> len('{"Use Disk": true, "Use Memory": false, "Deserialized": true, "Replication": 1}')
> 79
> >>> len('DISK_ONLY')
> 9
> {code}
> Fact 3: This leads to excessive log sizes for workloads with lots of partitions, because every partition would have the storage level field which is 60-70 bytes more than it should be.
> Suggested quick win: use the names of the predefined levels to identify them in the event log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org