You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Anders Arpteg <ar...@spotify.com> on 2015/08/31 21:34:16 UTC

Problems with Tungsten in Spark 1.5.0-rc2

Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
manager. One problem was when using the com.databricks.spark.avro reader
and the error(1) was received, see stack trace below. The problem does not
occur with the "sort" shuffle manager.

Another problem was in a large complex job with lots of transformations
occurring simultaneously, i.e. 50+ or more maps each shuffling data.
Received error(2) about inability to acquire memory which seems to also
have to do with Tungsten. Possibly some setting available to increase that
memory, because there's lots of heap memory available.

Am running on Yarn 2.2 with about 400 executors. Hoping this will give some
hints for improving the upcoming release, or for me to get some hints to
fix the problems.

Thanks,
Anders

*Error(1) *

15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
3387, lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException

       at java.io.DataInputStream.readInt(DataInputStream.java:392)

       at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)

       at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)

       at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)

       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

       at
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)

       at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)

       at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$
1.org$apache$spark$sql$execution$aggregate$Tung

stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)

       at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)

       at
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)

       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

       at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

       at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)

       at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)

       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

       at org.apache.spark.scheduler.Task.run(Task.scala:88)

       at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

       at java.lang.Thread.run(Thread.java:745)

*Error(2) *

5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException: Unable
to acquire 67108864 bytes of memory

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)

       at
org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)

       at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)

       at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

       at org.apache.spark.scheduler.Task.run(Task.scala:88)

       at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

       at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

       at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Anders Arpteg <ar...@spotify.com>.

Ok, thanks Reynold. When I tested dynamic allocation with Spark 1.4, it
complained saying that it was not tungsten compliant. Lets hope it works
with 1.5 then!

On Tue, Sep 8, 2015 at 5:49 AM Reynold Xin <rx...@databricks.com> wrote:

>
> On Wed, Sep 2, 2015 at 12:03 AM, Anders Arpteg <ar...@spotify.com> wrote:
>
>>
>> BTW, is it possible (or will it be) to use Tungsten with dynamic
>> allocation and the external shuffle manager?
>>
>>
> Yes - I think this already works. There isn't anything specific here
> related to Tungsten.
>
>

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Reynold Xin <rx...@databricks.com>.

On Wed, Sep 2, 2015 at 12:03 AM, Anders Arpteg <ar...@spotify.com> wrote:

>
> BTW, is it possible (or will it be) to use Tungsten with dynamic
> allocation and the external shuffle manager?
>
>
Yes - I think this already works. There isn't anything specific here
related to Tungsten.

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Anders Arpteg <ar...@spotify.com>.

I haven't done a comparative benchmarking between the two, and it would
involve some work to do so. A single run with each suffler would probably
not say that much since we have a rather busy cluster and the performance
heavily depends on what's currently running in the cluster. I have seen
less problems/exceptions though, and possibilities to decrease memory
requirements (or increase cores), which is of great help.

BTW, is it possible (or will it be) to use Tungsten with dynamic allocation
and the external shuffle manager?

Best,
Anders

On Tue, Sep 1, 2015 at 7:07 PM Davies Liu <da...@databricks.com> wrote:

> Thanks for the confirmation. The tungsten-sort is not the default
> ShuffleManager, this fix will not block 1.5 release, it may be in
> 1.5.1.
>
> BTW, How is the difference between sort and tungsten-sort
> ShuffleManager for this large job?
>
> On Tue, Sep 1, 2015 at 8:03 AM, Anders Arpteg <ar...@spotify.com> wrote:
> > A fix submitted less than one hour after my mail, very impressive Davies!
> > I've compiled your PR and tested it with the large job that failed
> before,
> > and it seems to work fine now without any exceptions. Awesome, thanks!
> >
> > Best,
> > Anders
> >
> > On Tue, Sep 1, 2015 at 1:38 AM Davies Liu <da...@databricks.com> wrote:
> >>
> >> I had sent out a PR [1] to fix 2), could you help to test that?
> >>
> >> [1]  https://github.com/apache/spark/pull/8543
> >>
> >> On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <ar...@spotify.com>
> >> wrote:
> >> > Was trying out 1.5 rc2 and noticed some issues with the Tungsten
> shuffle
> >> > manager. One problem was when using the com.databricks.spark.avro
> reader
> >> > and
> >> > the error(1) was received, see stack trace below. The problem does not
> >> > occur
> >> > with the "sort" shuffle manager.
> >> >
> >> > Another problem was in a large complex job with lots of
> transformations
> >> > occurring simultaneously, i.e. 50+ or more maps each shuffling data.
> >> > Received error(2) about inability to acquire memory which seems to
> also
> >> > have
> >> > to do with Tungsten. Possibly some setting available to increase that
> >> > memory, because there's lots of heap memory available.
> >> >
> >> > Am running on Yarn 2.2 with about 400 executors. Hoping this will give
> >> > some
> >> > hints for improving the upcoming release, or for me to get some hints
> to
> >> > fix
> >> > the problems.
> >> >
> >> > Thanks,
> >> > Anders
> >> >
> >> > Error(1)
> >> >
> >> > 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
> >> > 3387,
> >> > lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException
> >> >
> >> >        at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)
> >> >
> >> >        at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
> >> >
> >> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
> >> >
> >> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$
> 1.org$apache$spark$sql$execution$aggregate$Tung
> >> >
> >> >
> stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)
> >> >
> >> >        at
> >> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> >> >
> >> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> >> >
> >> >        at
> >> >
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> >> >
> >> >        at
> >> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> >> >
> >> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> >> >
> >> >        at
> >> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> >> >
> >> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >> >
> >> >        at
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >> >
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >
> >> >        at java.lang.Thread.run(Thread.java:745)
> >> >
> >> >
> >> > Error(2)
> >> >
> >> > 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0
> (TID
> >> > 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException:
> >> > Unable
> >> > to acquire 67108864 bytes of memory
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> >> >
> >> >        at
> >> >
> >> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >> >
> >> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >> >
> >> >        at
> >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >> >
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >
> >> >        at java.lang.Thread.run(Thread.java:745)
>

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Davies Liu <da...@databricks.com>.

Thanks for the confirmation. The tungsten-sort is not the default
ShuffleManager, this fix will not block 1.5 release, it may be in
1.5.1.

BTW, How is the difference between sort and tungsten-sort
ShuffleManager for this large job?

On Tue, Sep 1, 2015 at 8:03 AM, Anders Arpteg <ar...@spotify.com> wrote:
> A fix submitted less than one hour after my mail, very impressive Davies!
> I've compiled your PR and tested it with the large job that failed before,
> and it seems to work fine now without any exceptions. Awesome, thanks!
>
> Best,
> Anders
>
> On Tue, Sep 1, 2015 at 1:38 AM Davies Liu <da...@databricks.com> wrote:
>>
>> I had sent out a PR [1] to fix 2), could you help to test that?
>>
>> [1]  https://github.com/apache/spark/pull/8543
>>
>> On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <ar...@spotify.com>
>> wrote:
>> > Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
>> > manager. One problem was when using the com.databricks.spark.avro reader
>> > and
>> > the error(1) was received, see stack trace below. The problem does not
>> > occur
>> > with the "sort" shuffle manager.
>> >
>> > Another problem was in a large complex job with lots of transformations
>> > occurring simultaneously, i.e. 50+ or more maps each shuffling data.
>> > Received error(2) about inability to acquire memory which seems to also
>> > have
>> > to do with Tungsten. Possibly some setting available to increase that
>> > memory, because there's lots of heap memory available.
>> >
>> > Am running on Yarn 2.2 with about 400 executors. Hoping this will give
>> > some
>> > hints for improving the upcoming release, or for me to get some hints to
>> > fix
>> > the problems.
>> >
>> > Thanks,
>> > Anders
>> >
>> > Error(1)
>> >
>> > 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
>> > 3387,
>> > lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException
>> >
>> >        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)
>> >
>> >        at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>> >
>> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>> >
>> >        at
>> >
>> > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>> >
>> >        at
>> >
>> > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>> >
>> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$Tung
>> >
>> > stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>> >
>> >        at
>> >
>> > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>> >
>> >        at
>> >
>> > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)
>> >
>> >        at
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>> >
>> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> >
>> >        at
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> >
>> >        at
>> > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>> >
>> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> >
>> >        at
>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> >
>> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> >
>> >        at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> >
>> >        at
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >
>> >        at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >
>> >        at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> > Error(2)
>> >
>> > 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
>> > 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException:
>> > Unable
>> > to acquire 67108864 bytes of memory
>> >
>> >        at
>> >
>> > org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)
>> >
>> >        at
>> >
>> > org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)
>> >
>> >        at
>> >
>> > org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
>> >
>> >        at
>> >
>> > org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)
>> >
>> >        at
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>> >
>> >        at
>> >
>> > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> >
>> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> >
>> >        at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> >
>> >        at
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >
>> >        at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >
>> >        at java.lang.Thread.run(Thread.java:745)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Anders Arpteg <ar...@spotify.com>.

A fix submitted less than one hour after my mail, very impressive Davies!
I've compiled your PR and tested it with the large job that failed before,
and it seems to work fine now without any exceptions. Awesome, thanks!

Best,
Anders

On Tue, Sep 1, 2015 at 1:38 AM Davies Liu <da...@databricks.com> wrote:

> I had sent out a PR [1] to fix 2), could you help to test that?
>
> [1]  https://github.com/apache/spark/pull/8543
>
> On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <ar...@spotify.com>
> wrote:
> > Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
> > manager. One problem was when using the com.databricks.spark.avro reader
> and
> > the error(1) was received, see stack trace below. The problem does not
> occur
> > with the "sort" shuffle manager.
> >
> > Another problem was in a large complex job with lots of transformations
> > occurring simultaneously, i.e. 50+ or more maps each shuffling data.
> > Received error(2) about inability to acquire memory which seems to also
> have
> > to do with Tungsten. Possibly some setting available to increase that
> > memory, because there's lots of heap memory available.
> >
> > Am running on Yarn 2.2 with about 400 executors. Hoping this will give
> some
> > hints for improving the upcoming release, or for me to get some hints to
> fix
> > the problems.
> >
> > Thanks,
> > Anders
> >
> > Error(1)
> >
> > 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID
> 3387,
> > lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException
> >
> >        at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >
> >        at
> >
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)
> >
> >        at
> >
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)
> >
> >        at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
> >
> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >
> >        at
> >
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
> >
> >        at
> >
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
> >
> >        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> >
> >        at
> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
> >
> >        at
> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
> >
> >        at
> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$
> 1.org$apache$spark$sql$execution$aggregate$Tung
> >
> > stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
> >
> >        at
> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> >
> >        at
> >
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
> >
> >        at
> >
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)
> >
> >        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> >
> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> >
> >        at
> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> >
> >        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
> >
> >        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> >
> >        at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> >
> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >
> >        at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >        at java.lang.Thread.run(Thread.java:745)
> >
> >
> > Error(2)
> >
> > 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
> > 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException:
> Unable
> > to acquire 67108864 bytes of memory
> >
> >        at
> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)
> >
> >        at
> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)
> >
> >        at
> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
> >
> >        at
> >
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)
> >
> >        at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> >
> >        at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >
> >        at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >
> >        at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >        at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >        at java.lang.Thread.run(Thread.java:745)
>

Re: Problems with Tungsten in Spark 1.5.0-rc2

Posted by Davies Liu <da...@databricks.com>.

I had sent out a PR [1] to fix 2), could you help to test that?

[1]  https://github.com/apache/spark/pull/8543

On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg <ar...@spotify.com> wrote:
> Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
> manager. One problem was when using the com.databricks.spark.avro reader and
> the error(1) was received, see stack trace below. The problem does not occur
> with the "sort" shuffle manager.
>
> Another problem was in a large complex job with lots of transformations
> occurring simultaneously, i.e. 50+ or more maps each shuffling data.
> Received error(2) about inability to acquire memory which seems to also have
> to do with Tungsten. Possibly some setting available to increase that
> memory, because there's lots of heap memory available.
>
> Am running on Yarn 2.2 with about 400 executors. Hoping this will give some
> hints for improving the upcoming release, or for me to get some hints to fix
> the problems.
>
> Thanks,
> Anders
>
> Error(1)
>
> 15/08/31 18:30:57 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 3387,
> lon4-hadoopslave-c245.lon4.spotify.net): java.io.EOFException
>
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
>        at
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:121)
>
>        at
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:109)
>
>        at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>
>        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>        at
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>
>        at
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>
>        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:366)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$Tung
>
> stenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>
>        at
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119)
>
>        at
> org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:47)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
>        at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
>        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:745)
>
>
> Error(2)
>
> 5/08/31 18:41:25 WARN TaskSetManager: Lost task 16.1 in stage 316.0 (TID
> 32686, lon4-hadoopslave-b925.lon4.spotify.net): java.io.IOException: Unable
> to acquire 67108864 bytes of memory
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.acquireNewPageIfNecessary(UnsafeShuffleExternalSorter.java:385)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleExternalSorter.insertRecord(UnsafeShuffleExternalSorter.java:435)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:246)
>
>        at
> org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:174)
>
>        at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>
>        at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>
>        at org.apache.spark.scheduler.Task.run(Task.scala:88)
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:745)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org