You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Magnus Nilsson <ma...@gmail.com> on 2019/07/17 11:29:06 UTC

CPU:s per task

Hello all,

TLDR; Can the number of cores used by a task vary or is it always one core
per task? Is there a UI, metrics or logs I can check to see the number of
cores used by the task?

I have an ETL-pipeline where I do some transformations. In one of the
stages which ought to be quite CPU-heavy there's only a single task running
for a few minutes. I'm trying to determine if this means only one cpu core
is in use or if a single task could use many cores under the cover?

When I read data from an Event Hub the stage includes as many tasks as
there are partitions in the Event Hub up to the maximum nr of cores
available in the cluster. Clearly those tasks use one core each and are
limited in parallellism by the cluster size.

Regards,

Magnus