You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/02/17 04:02:41 UTC

[jira] [Assigned] (SPARK-19556) Broadcast data is not encrypted when I/O encryption is on

     [ https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-19556:
------------------------------------

    Assignee: Apache Spark

> Broadcast data is not encrypted when I/O encryption is on
> ---------------------------------------------------------
>
>                 Key: SPARK-19556
>                 URL: https://issues.apache.org/jira/browse/SPARK-19556
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Marcelo Vanzin
>            Assignee: Apache Spark
>
> {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to write and read data:
> {code}
>       if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true)) {
>         throw new SparkException(s"Failed to store $pieceId of $broadcastId in local BlockManager")
>       }
> {code}
> {code}
>       bm.getLocalBytes(pieceId) match {
>         case Some(block) =>
>           blocks(pid) = block
>           releaseLock(pieceId)
>         case None =>
>           bm.getRemoteBytes(pieceId) match {
>             case Some(b) =>
>               if (checksumEnabled) {
>                 val sum = calcChecksum(b.chunks(0))
>                 if (sum != checksums(pid)) {
>                   throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:" +
>                     s" $sum != ${checksums(pid)}")
>                 }
>               }
>               // We found the block from remote executors/driver's BlockManager, so put the block
>               // in this executor's BlockManager.
>               if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster = true)) {
>                 throw new SparkException(
>                   s"Failed to store $pieceId of $broadcastId in local BlockManager")
>               }
>               blocks(pid) = b
>             case None =>
>               throw new SparkException(s"Failed to get $pieceId of $broadcastId")
>           }
>       }
> {code}
> The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory.
> The correct fix here is actually not to change {{TorrentBroadcast}}, but to fix the block manager so that:
> - data stored in memory is not encrypted
> - data written to disk is encrypted
> This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g. see SPARK-19520), but requires some tricky changes inside the BlockManager to still be able to use file channels to avoid reading whole blocks back into memory so they can be decrypted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org