You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Tomo Suzuki <su...@google.com> on 2020/01/15 03:03:35 UTC

Re: Quota limitation for Java tests

Hi Beam committers,

I encountered a similar problem today for "Run Dataflow ValidatesRunner":
  Dataflow quota error for jobs-per-project quota. Project
apache-beam-testing is running 303 jobs.
  https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/190/testReport/junit/org.apache.beam.sdk/PipelineTest/testTupleProjectionTransform/
via https://github.com/apache/beam/pull/10554 .

Can somebody with permission check any unexpected long-running jobs?

Regards,
Tomo

On Tue, Dec 10, 2019 at 10:37 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
> Of course, fixing https://issues.apache.org/jira/browse/BEAM-8939 is also crucial to avoid resource exhaustion but I didn't have time to do this. Anyone, feel free to resolve it.
>
> Thanks!
>
> wt., 10 gru 2019 o 16:25 Łukasz Gajowy <lg...@apache.org> napisał(a):
>>
>> https://github.com/apache/beam/pull/10342 - pr that skips the tests listed above - looking for reviewers
>>
>> Thanks!
>>
>> wt., 10 gru 2019 o 13:30 Łukasz Gajowy <lg...@apache.org> napisał(a):
>>>
>>> What I invoked in the apache-beam-testing project:
>>>
>>> gcloud dataflow jobs list --created-before=-P5H --status=active --format="value(JOB_ID)" --region=us-central|xargs gcloud dataflow jobs cancel
>>>
>>> wt., 10 gru 2019 o 13:28 Łukasz Gajowy <lg...@apache.org> napisał(a):
>>>>
>>>> Hi Kirill,
>>>>
>>>> We (along with Michał and Kamil) noticed the problem as well in Dataflow ValidatesRunner suites yesterday. I started investigating the problem and I noticed that there are jobs running for 5 days and counting. It seems that those are not stopped by "beam_CancelStaleDataflowJobs" job that runs randomly each day. After investigating deeper, it seems that lots of the jobs that are stale are from "https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/" job that is currently being ABORTED due to timeout.
>>>>
>>>> Some tests (I'm not sure if this is the exhaustive list but they seem to appear in the dataflow console repeatedly) that seem to not be killed and eat our resources:
>>>>  - test_reshuffle_preserves_timestamps (spotted multiple times in the dataflow console) (Python SDK)
>>>>  - test_flatten_same_pcollections (Python SDK)
>>>>  - testPairWithIndexWindowedTimestampedBounded (Java SDK)
>>>>  - testPairWithIndexBasicBounded
>>>>
>>>> I created https://issues.apache.org/jira/browse/BEAM-8938 to track tests like this. Right now I'm going to kill all jobs that hang like this and ignore the tests that I tracked down in a pr for the issue I created.
>>>>
>>>> I think it's good that job_CancelStaleDataflowJobs didn't catch them - I think that if it did, we would not spot the problem. Is it possible to set up some alerting on Dataflow instead of automatically cleaning the jobs? IMO we should fix the tests rather than cancel them.
>>>>
>>>> Thanks,
>>>> Łukasz
>>>>
>>>>
>>>> wt., 10 gru 2019 o 00:09 Kirill Kozlov <ki...@google.com> napisał(a):
>>>>>
>>>>> Hello everyone!
>>>>>
>>>>> It looks like JavaPostCommit Jenkins tests [1] are failing due to CPU quota limitations.
>>>>> Could someone please look into this?
>>>>>
>>>>> [1] https://builds.apache.org/job/beam_PostCommit_Java/4838/testReport/junit/org.apache.beam.examples.complete/TrafficMaxLaneFlowIT/testE2ETrafficMaxLaneFlow/
>>>>>
>>>>> --
>>>>> Kirill



-- 
Regards,
Tomo