You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2017/03/29 12:28:42 UTC
[jira] [Assigned] (SPARK-19556) Broadcast data is not encrypted
when I/O encryption is on
[ https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-19556:
-----------------------------------
Assignee: Marcelo Vanzin
> Broadcast data is not encrypted when I/O encryption is on
> ---------------------------------------------------------
>
> Key: SPARK-19556
> URL: https://issues.apache.org/jira/browse/SPARK-19556
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: Marcelo Vanzin
> Assignee: Marcelo Vanzin
>
> {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to write and read data:
> {code}
> if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true)) {
> throw new SparkException(s"Failed to store $pieceId of $broadcastId in local BlockManager")
> }
> {code}
> {code}
> bm.getLocalBytes(pieceId) match {
> case Some(block) =>
> blocks(pid) = block
> releaseLock(pieceId)
> case None =>
> bm.getRemoteBytes(pieceId) match {
> case Some(b) =>
> if (checksumEnabled) {
> val sum = calcChecksum(b.chunks(0))
> if (sum != checksums(pid)) {
> throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:" +
> s" $sum != ${checksums(pid)}")
> }
> }
> // We found the block from remote executors/driver's BlockManager, so put the block
> // in this executor's BlockManager.
> if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster = true)) {
> throw new SparkException(
> s"Failed to store $pieceId of $broadcastId in local BlockManager")
> }
> blocks(pid) = b
> case None =>
> throw new SparkException(s"Failed to get $pieceId of $broadcastId")
> }
> }
> {code}
> The thing these block manager methods have in common is that they bypass the encryption code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data to be written to disk if those blocks need to be evicted from memory.
> The correct fix here is actually not to change {{TorrentBroadcast}}, but to fix the block manager so that:
> - data stored in memory is not encrypted
> - data written to disk is encrypted
> This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g. see SPARK-19520), but requires some tricky changes inside the BlockManager to still be able to use file channels to avoid reading whole blocks back into memory so they can be decrypted.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org