You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Valentyn Tymofieiev <va...@google.com> on 2019/04/16 20:57:43 UTC

Insufficient CPU quota in apache-beam-testing causes test flakes

FYI, I have recently observed a large amount of test failures in Beam test
suites where Dataflow Jobs failed due to a lack of CPU quota in
apache-beam-testing project.

We have been adding new suites for Python 3.x versions, which may have
contributed to this. problem.

I have not investigated yet what consumes the quota yet, but the usage
remains high.

Possible mitigation options:
- Increase quota.
- Decrease per-suite parallelism [1]. Currently we may  run 1-8 tests from
the same suite concurrently.
- Audit usage, perhaps kill stale jobs or VMs.

Ideas/opinions welcome.

I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this.

[1]
https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22

Re: Insufficient CPU quota in apache-beam-testing causes test flakes

Posted by Valentyn Tymofieiev <va...@google.com>.
Thanks, Yifan.

1. It appears that there are 32 jenkins-related instances, 16 cores each,
which consume over 2/3 of available CPU quota.
2. Among old VMs there are 6 1-core VMs, that look like
"gke-io-datastores-*" and "gke-metrics-*". They don't consume much quota,
but I am curious why do we have these VMs up. Anyone has context?
3. The rest of VMs currently running seems to be test VMs started today. I
also removed a couple of stray VMs.

Yifan, I am assigning https://issues.apache.org/jira/browse/BEAM-7085 to
you since Jenkins is the biggest quota consumer right now and you are
actively working on it.

On Tue, Apr 16, 2019 at 2:09 PM Yifan Zou <yi...@google.com> wrote:

> We recently created 16 compute instances for the Jenkins. Each one of them
> has 16 CPUs, means they consume 256 CPU in total. I guess that is why the
> CPU usage in us-central1 remains high. We're working on the migrating the
> rest of old Jenkins agents, and the old instances will be removed once
> finish. That should relieve the pain of quota.
>
> Yifan
>
> On Tue, Apr 16, 2019 at 1:58 PM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
>> FYI, I have recently observed a large amount of test failures in Beam
>> test suites where Dataflow Jobs failed due to a lack of CPU quota in
>> apache-beam-testing project.
>>
>> We have been adding new suites for Python 3.x versions, which may have
>> contributed to this. problem.
>>
>> I have not investigated yet what consumes the quota yet, but the usage
>> remains high.
>>
>> Possible mitigation options:
>> - Increase quota.
>> - Decrease per-suite parallelism [1]. Currently we may  run 1-8 tests
>> from the same suite concurrently.
>> - Audit usage, perhaps kill stale jobs or VMs.
>>
>> Ideas/opinions welcome.
>>
>> I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this.
>>
>> [1]
>> https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22
>>
>

Re: Insufficient CPU quota in apache-beam-testing causes test flakes

Posted by Yifan Zou <yi...@google.com>.
We recently created 16 compute instances for the Jenkins. Each one of them
has 16 CPUs, means they consume 256 CPU in total. I guess that is why the
CPU usage in us-central1 remains high. We're working on the migrating the
rest of old Jenkins agents, and the old instances will be removed once
finish. That should relieve the pain of quota.

Yifan

On Tue, Apr 16, 2019 at 1:58 PM Valentyn Tymofieiev <va...@google.com>
wrote:

> FYI, I have recently observed a large amount of test failures in Beam test
> suites where Dataflow Jobs failed due to a lack of CPU quota in
> apache-beam-testing project.
>
> We have been adding new suites for Python 3.x versions, which may have
> contributed to this. problem.
>
> I have not investigated yet what consumes the quota yet, but the usage
> remains high.
>
> Possible mitigation options:
> - Increase quota.
> - Decrease per-suite parallelism [1]. Currently we may  run 1-8 tests from
> the same suite concurrently.
> - Audit usage, perhaps kill stale jobs or VMs.
>
> Ideas/opinions welcome.
>
> I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this.
>
> [1]
> https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22
>