You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Artemis User <ar...@dtechspace.com> on 2022/02/24 20:24:02 UTC

Non-Partition based Workload Distribution

We got a Spark program that iterates through a while loop on the same 
input DataFrame and produces different results per iteration. I see 
through Spark UI that the workload is concentrated on a single core of 
the same worker.  Is there anyway to distribute the workload to 
different cores/workers, e.g. per iteration, since each iteration is not 
dependent from each other?

Certainly this type of problem could be easily implemented using 
threads, e.g. spawn a child thread for each iteration, and wait at the 
end of the loop.  But threads apparently don't go beyond the worker 
boundary.  We also thought about using MapReduce, but it won't be 
straightforward since mapping only deals with rows, not at the dataframe 
level.  Any thoughts/suggestions are highly appreciated..

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Non-Partition based Workload Distribution

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,

not quite sure here, but can you please share your code?

Regards,
Gourav Sengupta

On Thu, Feb 24, 2022 at 8:25 PM Artemis User <ar...@dtechspace.com> wrote:

> We got a Spark program that iterates through a while loop on the same
> input DataFrame and produces different results per iteration. I see
> through Spark UI that the workload is concentrated on a single core of
> the same worker.  Is there anyway to distribute the workload to
> different cores/workers, e.g. per iteration, since each iteration is not
> dependent from each other?
>
> Certainly this type of problem could be easily implemented using
> threads, e.g. spawn a child thread for each iteration, and wait at the
> end of the loop.  But threads apparently don't go beyond the worker
> boundary.  We also thought about using MapReduce, but it won't be
> straightforward since mapping only deals with rows, not at the dataframe
> level.  Any thoughts/suggestions are highly appreciated..
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>