You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Long, Andrew" <lo...@amazon.com.INVALID> on 2019/04/23 20:56:39 UTC

FW: Stage 152 contains a task of very large size (12747 KB). The maximum recommended task size is 100 KB

Hey Friends,

Is there an easy way of figuring out whats being pull into the task context?  I’ve been getting the following message which I suspect means I’ve unintentional caught some large objects but figuring out what those objects are is stumping me.

19/04/23 13:52:13 WARN org.apache.spark.internal.Logging$class TaskSetManager: Stage 152 contains a task of very large size (12747 KB). The maximum recommended task size is 100 KB

Cheers Andrew

Re: FW: Stage 152 contains a task of very large size (12747 KB). The maximum recommended task size is 100 KB

Posted by Russell Spitzer <ru...@gmail.com>.

I usually only see that in regards to folks parallelizing very large
objects. From what I know, it's really just the data inside the "Partition"
class of the RDD that is being sent back and forth. So usually something
like spark.parallelize(Seq(reallyBigMap)) or something like that. The
parallelize function jams all that data into the RDD's Partition metadata
so that can easily overwhelm the task size.

On Tue, Apr 23, 2019 at 3:57 PM Long, Andrew <lo...@amazon.com.invalid>
wrote:

> Hey Friends,
>
>
>
> Is there an easy way of figuring out whats being pull into the task
> context?  I’ve been getting the following message which I suspect means
> I’ve unintentional caught some large objects but figuring out what those
> objects are is stumping me.
>
>
>
> 19/04/23 13:52:13 WARN org.apache.spark.internal.Logging$class
> TaskSetManager: Stage 152 contains a task of very large size (12747 KB).
> The maximum recommended task size is 100 KB
>
>
>
> Cheers Andrew
>