You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "eabour@163.com" <ea...@163.com> on 2022/10/26 10:36:48 UTC

Running 30 Spark applications at the same time is slower than one on average

Hi All,

I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM + 3DN/NM), and yarn with 192C/240G. I used the following test scenario:

1.spark app resource with 2G driver memory/2C driver vcore/1 executor nums/2G executor memory/2C executor vcore.
2.one spark app will use 5G4C on yarn.
3.first, I only run one spark app takes 40s.
4.Then, I run 30 the same spark app at once, and each spark app takes 80s on average.

So, I want to know why the run time gap is so big, and how to optimize?

Thanks


Re: Running 30 Spark applications at the same time is slower than one on average

Posted by Sean Owen <sr...@gmail.com>.
That just means G = GB mem, C = cores, but yeah the driver and executors
are very small, possibly related.

On Wed, Oct 26, 2022 at 12:34 PM Artemis User <ar...@dtechspace.com>
wrote:

> Are these Cloudera specific acronyms?  Not sure how Cloudera configures
> Spark differently, but obviously the number of nodes is too small,
> considering each app only uses a small number of cores and RAM.  So you may
> consider increase the number of nodes.   When all these apps jam on a few
> nodes, the cluster manager/scheduler and/or the network becomes
> overwhelmed...
>
> On 10/26/22 8:09 AM, Sean Owen wrote:
>
> Resource contention. Now all the CPU and I/O is competing and probably
> slows down
>
> On Wed, Oct 26, 2022, 5:37 AM eabour@163.com <ea...@163.com> wrote:
>
>> Hi All,
>>
>> I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM +
>> 3DN/NM), and yarn with 192C/240G. I used the following test scenario:
>>
>> 1.spark app resource with 2G driver memory/2C driver vcore/1 executor
>> nums/2G executor memory/2C executor vcore.
>> 2.one spark app will use 5G4C on yarn.
>> 3.first, I only run one spark app takes 40s.
>> 4.Then, I run 30 the same spark app at once, and each spark app takes 80s
>> on average.
>>
>> So, I want to know why the run time gap is so big, and how to optimize?
>>
>> Thanks
>>
>>
>

Re: Running 30 Spark applications at the same time is slower than one on average

Posted by Artemis User <ar...@dtechspace.com>.
Are these Cloudera specific acronyms?  Not sure how Cloudera configures 
Spark differently, but obviously the number of nodes is too small, 
considering each app only uses a small number of cores and RAM.  So you 
may consider increase the number of nodes.   When all these apps jam on 
a few nodes, the cluster manager/scheduler and/or the network becomes 
overwhelmed...

On 10/26/22 8:09 AM, Sean Owen wrote:
> Resource contention. Now all the CPU and I/O is competing and probably 
> slows down
>
> On Wed, Oct 26, 2022, 5:37 AM eabour@163.com <ea...@163.com> wrote:
>
>     Hi All,
>
>     I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM
>     + 3DN/NM), and yarn with 192C/240G. I used the following test
>     scenario:
>
>     1.spark app resource with 2G driver memory/2C driver vcore/1
>     executor nums/2G executor memory/2C executor vcore.
>     2.one spark app will use 5G4C on yarn.
>     3.first, I only run one spark app takes 40s.
>     4.Then, I run 30 the same spark app at once, and each spark app
>     takes 80s on average.
>
>     So, I want to know why the run time gap is so big, and how to
>     optimize?
>
>     Thanks
>

Re: Running 30 Spark applications at the same time is slower than one on average

Posted by Sean Owen <sr...@gmail.com>.
Resource contention. Now all the CPU and I/O is competing and probably
slows down

On Wed, Oct 26, 2022, 5:37 AM eabour@163.com <ea...@163.com> wrote:

> Hi All,
>
> I have a CDH5.16.2 hadoop cluster with 1+3 nodes(64C/128G, 1NN/RM +
> 3DN/NM), and yarn with 192C/240G. I used the following test scenario:
>
> 1.spark app resource with 2G driver memory/2C driver vcore/1 executor
> nums/2G executor memory/2C executor vcore.
> 2.one spark app will use 5G4C on yarn.
> 3.first, I only run one spark app takes 40s.
> 4.Then, I run 30 the same spark app at once, and each spark app takes 80s
> on average.
>
> So, I want to know why the run time gap is so big, and how to optimize?
>
> Thanks
>
>