You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2018/10/31 20:53:00 UTC

[jira] [Updated] (SPARK-25827) Replicating a block > 2gb with encryption fails

     [ https://issues.apache.org/jira/browse/SPARK-25827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Imran Rashid updated SPARK-25827:
---------------------------------
    Description: 
There are a couple of issues with replicating & remote reads of large encrypted blocks, which try to create buffers where they shouldn't.  Some of this is properly limiting the size of arrays under SPARK-25904, but there are others specific to encryption & trying to convert EncryptedBlockData into a regular ByteBuffer.

*EDIT*: moved general array size stuff under SPARK-25904.

  was:
When replicating large blocks with encryption, we try to allocate an array of size {{Int.MaxValue}} which is just a bit too big for the JVM.  This is basically the same as SPARK-25704, just another case.

In DiskStore:
{code}
val chunkSize = math.min(remaining, Int.MaxValue)
{code}

{noformat}
18/10/22 17:04:06 WARN storage.BlockManager: Failed to replicate rdd_1_1 to ..., failure #0
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
	at org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:133)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1421)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1230)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
...
Caused by: java.lang.RuntimeException: java.io.IOException: Destination failed while reading stream
...
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.spark.storage.BlockManager$$anon$1$$anonfun$7.apply(BlockManager.scala:446)
	at org.apache.spark.storage.BlockManager$$anon$1$$anonfun$7.apply(BlockManager.scala:446)
	at org.apache.spark.storage.EncryptedBlockData.toChunkedByteBuffer(DiskStore.scala:221)
	at org.apache.spark.storage.BlockManager$$anon$1.onComplete(BlockManager.scala:449)
...
{noformat}


> Replicating a block > 2gb with encryption fails
> -----------------------------------------------
>
>                 Key: SPARK-25827
>                 URL: https://issues.apache.org/jira/browse/SPARK-25827
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> There are a couple of issues with replicating & remote reads of large encrypted blocks, which try to create buffers where they shouldn't.  Some of this is properly limiting the size of arrays under SPARK-25904, but there are others specific to encryption & trying to convert EncryptedBlockData into a regular ByteBuffer.
> *EDIT*: moved general array size stuff under SPARK-25904.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org