You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Alexis Seigneurin <as...@ippon.fr> on 2015/08/05 18:25:37 UTC

Memory allocation error with Spark 1.5

Hi,

I'm receiving a memory allocation error with a recent build of Spark 1.5:

java.io.IOException: Unable to acquire 67108864 bytes of memory
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)

The issue appears when joining 2 datasets. One with 6084 records, the other
one with 200 records. I'm expecting to receive 200 records in the result.

I'm using a homemade build prepared from "branch-1.5" with commit ID
"eedb996". I have run "mvn -DskipTests clean install" to generate that
build.

Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.

I've prepared a test case that can be built and executed very easily (data
files are included in the repo):
https://github.com/aseigneurin/spark-testcase

One thing to note is that the issue arises when the master is set to
"local[*]" but not when set to "local". Both options work without problem
with Spark 1.4, though.

Any help will be greatly appreciated!

Many thanks,
Alexis

Re: Memory allocation error with Spark 1.5

Posted by Alexis Seigneurin <as...@ippon.fr>.

Works like a charm. Thanks Reynold for the quick and efficient response!

Alexis

2015-08-05 19:19 GMT+02:00 Reynold Xin <rx...@databricks.com>:

> In Spark 1.5, we have a new way to manage memory (part of Project
> Tungsten). The default unit of memory allocation is 64MB, which is way too
> high when you have 1G of memory allocated in total and have more than 4
> threads.
>
> We will reduce the default page size before releasing 1.5.  For now, you
> can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).
>
>
> https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125
>
> On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin <as...@ippon.fr>
> wrote:
>
>> Hi,
>>
>> I'm receiving a memory allocation error with a recent build of Spark 1.5:
>>
>> java.io.IOException: Unable to acquire 67108864 bytes of memory
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
>> at
>> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
>> at
>> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
>> at
>> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
>> at
>> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)
>>
>>
>> The issue appears when joining 2 datasets. One with 6084 records, the
>> other one with 200 records. I'm expecting to receive 200 records in the
>> result.
>>
>> I'm using a homemade build prepared from "branch-1.5" with commit ID
>> "eedb996". I have run "mvn -DskipTests clean install" to generate that
>> build.
>>
>> Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.
>>
>> I've prepared a test case that can be built and executed very easily
>> (data files are included in the repo):
>> https://github.com/aseigneurin/spark-testcase
>>
>> One thing to note is that the issue arises when the master is set to
>> "local[*]" but not when set to "local". Both options work without problem
>> with Spark 1.4, though.
>>
>> Any help will be greatly appreciated!
>>
>> Many thanks,
>> Alexis
>>
>
>

Re: Memory allocation error with Spark 1.5

Posted by Reynold Xin <rx...@databricks.com>.

In Spark 1.5, we have a new way to manage memory (part of Project
Tungsten). The default unit of memory allocation is 64MB, which is way too
high when you have 1G of memory allocated in total and have more than 4
threads.

We will reduce the default page size before releasing 1.5.  For now, you
can just reduce spark.buffer.pageSize variable to a lower value (e.g. 16m).

https://github.com/apache/spark/blob/702aa9d7fb16c98a50e046edfd76b8a7861d0391/sql/core/src/main/scala/org/apache/spark/sql/execution/sort.scala#L125

On Wed, Aug 5, 2015 at 9:25 AM, Alexis Seigneurin <as...@ippon.fr>
wrote:

> Hi,
>
> I'm receiving a memory allocation error with a recent build of Spark 1.5:
>
> java.io.IOException: Unable to acquire 67108864 bytes of memory
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:348)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:398)
> at
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:92)
> at
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:174)
> at
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:146)
> at
> org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:126)
>
>
> The issue appears when joining 2 datasets. One with 6084 records, the
> other one with 200 records. I'm expecting to receive 200 records in the
> result.
>
> I'm using a homemade build prepared from "branch-1.5" with commit ID
> "eedb996". I have run "mvn -DskipTests clean install" to generate that
> build.
>
> Apart from that, I'm using Java 1.7.0_51 and Maven 3.3.3.
>
> I've prepared a test case that can be built and executed very easily (data
> files are included in the repo):
> https://github.com/aseigneurin/spark-testcase
>
> One thing to note is that the issue arises when the master is set to
> "local[*]" but not when set to "local". Both options work without problem
> with Spark 1.4, though.
>
> Any help will be greatly appreciated!
>
> Many thanks,
> Alexis
>