You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Mikhail Gryzykhin <mi...@google.com> on 2018/08/07 21:14:17 UTC

Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

Hi everyone,

Pablo found that load on our Dataflow test cluster started to grow couple
of days ago:
[image: image.png]

I've done some digging and seems that we schedule jobs that never end:
[image: image.png]

I didn't manage to find code for who schedules these jobs, but suspect that
it might be Nexmark jobs since we were fixing those recently.

Can someone help me confirm that this is the reason and find culprit/fix it?

Thank you,
--Mikhail

Have feedback <http://go/migryz-feedback>?

Re: Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

Posted by Mikhail Gryzykhin <mi...@google.com>.
UPD: I see requests graph go down as expected.

[image: image.png]
Created BEAM-5104 <https://issues.apache.org/jira/browse/BEAM-5104> to add
relevant graphs to common dashboard.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Tue, Aug 7, 2018 at 3:28 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Cool. Thank you for taking care of this.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Tue, Aug 7, 2018 at 2:21 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> Sorry, this is me again. Above some threshold of work Nexmark Query 7
>> never competes in streaming mode on dataflow. No idea what the cause is,
>> but I've tuned the test to prevent it from happening again. I also canceled
>> all the leaked jobs. All the Dataflow Nexmark jobs are now completing in
>> under an hour:
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>>
>> Andrew
>>
>> On Tue, Aug 7, 2018 at 2:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Pablo found that load on our Dataflow test cluster started to grow
>>> couple of days ago:
>>> [image: image.png]
>>>
>>> I've done some digging and seems that we schedule jobs that never end:
>>> [image: image.png]
>>>
>>> I didn't manage to find code for who schedules these jobs, but suspect
>>> that it might be Nexmark jobs since we were fixing those recently.
>>>
>>> Can someone help me confirm that this is the reason and find culprit/fix
>>> it?
>>>
>>> Thank you,
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>>

Re: Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

Posted by Pablo Estrada <pa...@google.com>.
I believe this affected the stability of other test suites that schedule
jobs on Dataflow. I'll monitor those suites to see if things go back to
normal.
Thanks Andrew and Mikhail for looking into this!

Best
-P.

On Tue, Aug 7, 2018 at 3:29 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Cool. Thank you for taking care of this.
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Tue, Aug 7, 2018 at 2:21 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> Sorry, this is me again. Above some threshold of work Nexmark Query 7
>> never competes in streaming mode on dataflow. No idea what the cause is,
>> but I've tuned the test to prevent it from happening again. I also canceled
>> all the leaked jobs. All the Dataflow Nexmark jobs are now completing in
>> under an hour:
>> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>>
>> Andrew
>>
>> On Tue, Aug 7, 2018 at 2:15 PM Mikhail Gryzykhin <mi...@google.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Pablo found that load on our Dataflow test cluster started to grow
>>> couple of days ago:
>>> [image: image.png]
>>>
>>> I've done some digging and seems that we schedule jobs that never end:
>>> [image: image.png]
>>>
>>> I didn't manage to find code for who schedules these jobs, but suspect
>>> that it might be Nexmark jobs since we were fixing those recently.
>>>
>>> Can someone help me confirm that this is the reason and find culprit/fix
>>> it?
>>>
>>> Thank you,
>>> --Mikhail
>>>
>>> Have feedback <http://go/migryz-feedback>?
>>>
>> --
Got feedback? go/pabloem-feedback

Re: Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

Posted by Mikhail Gryzykhin <mi...@google.com>.
Cool. Thank you for taking care of this.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Tue, Aug 7, 2018 at 2:21 PM Andrew Pilloud <ap...@google.com> wrote:

> Sorry, this is me again. Above some threshold of work Nexmark Query 7
> never competes in streaming mode on dataflow. No idea what the cause is,
> but I've tuned the test to prevent it from happening again. I also canceled
> all the leaked jobs. All the Dataflow Nexmark jobs are now completing in
> under an hour:
> https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/
>
> Andrew
>
> On Tue, Aug 7, 2018 at 2:15 PM Mikhail Gryzykhin <mi...@google.com>
> wrote:
>
>> Hi everyone,
>>
>> Pablo found that load on our Dataflow test cluster started to grow couple
>> of days ago:
>> [image: image.png]
>>
>> I've done some digging and seems that we schedule jobs that never end:
>> [image: image.png]
>>
>> I didn't manage to find code for who schedules these jobs, but suspect
>> that it might be Nexmark jobs since we were fixing those recently.
>>
>> Can someone help me confirm that this is the reason and find culprit/fix
>> it?
>>
>> Thank you,
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>

Re: Dataflow test cluster load grows infinitely due to never ending jobs (Warning big pictures)

Posted by Andrew Pilloud <ap...@google.com>.
Sorry, this is me again. Above some threshold of work Nexmark Query 7 never
competes in streaming mode on dataflow. No idea what the cause is, but I've
tuned the test to prevent it from happening again. I also canceled all the
leaked jobs. All the Dataflow Nexmark jobs are now completing in under an
hour: https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Dataflow/

Andrew

On Tue, Aug 7, 2018 at 2:15 PM Mikhail Gryzykhin <mi...@google.com> wrote:

> Hi everyone,
>
> Pablo found that load on our Dataflow test cluster started to grow couple
> of days ago:
> [image: image.png]
>
> I've done some digging and seems that we schedule jobs that never end:
> [image: image.png]
>
> I didn't manage to find code for who schedules these jobs, but suspect
> that it might be Nexmark jobs since we were fixing those recently.
>
> Can someone help me confirm that this is the reason and find culprit/fix
> it?
>
> Thank you,
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>