You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/05/07 19:54:01 UTC

[jira] [Commented] (SPARK-2496) Compression streams should write its codec info to the stream

    [ https://issues.apache.org/jira/browse/SPARK-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533084#comment-14533084 ] 

Josh Rosen commented on SPARK-2496:
-----------------------------------

One potential concern here is the ability to concatenate compressed data without decompressing it; many compression formats support this, such as Snappy, and storing our own metadata at the beginning of the compressed stream might break this.

> Compression streams should write its codec info to the stream
> -------------------------------------------------------------
>
>                 Key: SPARK-2496
>                 URL: https://issues.apache.org/jira/browse/SPARK-2496
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Priority: Critical
>
> Spark sometime store compressed data outside of Spark (e.g. event logs, blocks in tachyon), and those data are read back directly using the codec configured by the user. When the codec differs between runs, Spark wouldn't be able to read the codec back. 
> I'm not sure what the best strategy here is yet. If we write the codec identifier for all streams, then we will be writing a lot of identifiers for shuffle blocks. One possibility is to only write it for blocks that will be shared across different Spark instances (i.e. managed outside of Spark), which includes tachyon blocks and event log blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org