You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Artemis User <ar...@dtechspace.com> on 2022/02/24 20:24:02 UTC
Non-Partition based Workload Distribution
We got a Spark program that iterates through a while loop on the same
input DataFrame and produces different results per iteration. I see
through Spark UI that the workload is concentrated on a single core of
the same worker. Is there anyway to distribute the workload to
different cores/workers, e.g. per iteration, since each iteration is not
dependent from each other?
Certainly this type of problem could be easily implemented using
threads, e.g. spawn a child thread for each iteration, and wait at the
end of the loop. But threads apparently don't go beyond the worker
boundary. We also thought about using MapReduce, but it won't be
straightforward since mapping only deals with rows, not at the dataframe
level. Any thoughts/suggestions are highly appreciated..
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Non-Partition based Workload Distribution
Posted by Gourav Sengupta <go...@gmail.com>.
Hi,
not quite sure here, but can you please share your code?
Regards,
Gourav Sengupta
On Thu, Feb 24, 2022 at 8:25 PM Artemis User <ar...@dtechspace.com> wrote:
> We got a Spark program that iterates through a while loop on the same
> input DataFrame and produces different results per iteration. I see
> through Spark UI that the workload is concentrated on a single core of
> the same worker. Is there anyway to distribute the workload to
> different cores/workers, e.g. per iteration, since each iteration is not
> dependent from each other?
>
> Certainly this type of problem could be easily implemented using
> threads, e.g. spawn a child thread for each iteration, and wait at the
> end of the loop. But threads apparently don't go beyond the worker
> boundary. We also thought about using MapReduce, but it won't be
> straightforward since mapping only deals with rows, not at the dataframe
> level. Any thoughts/suggestions are highly appreciated..
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>