You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by German Schiavon <gs...@gmail.com> on 2021/04/01 14:23:28 UTC

Re: [Spark Core]: Adding support for size based partition coalescing

Hi!

have you tried spark.sql.files.maxRecordsPerFile ?

As a workaround you could try to see how many rows are 128MB and then set
that number in that property.

Best


On Thu, 1 Apr 2021 at 00:38, mhawes <ha...@gmail.com> wrote:

> Okay from looking closer at some of the code, I'm not sure that what I'm
> asking for in terms of adaptive execution makes much sense as it can only
> happen between stages. I.e. optimising future /stages/ based on the results
> of previous stages. Thus an "on-demand" adaptive coalesce doesn't make much
> sense as it wouldn't necessarily occur at a stage boundary.
>
> However I think my original question still stands of:
> - How to /dynamically/ deal with poorly partitioned data without incurring
> a
> shuffle or extra computation.
>
> I think the only thing that's changed is that I no longer have any good
> ideas on how to do it :/
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>