You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Valentyn Tymofieiev <va...@google.com> on 2022/01/14 19:46:34 UTC

Inventory job stuck on nodes 10-16, disk filling up

It appears that inventory job has not been run on Jenkins nodes 10-16,

The  logs say [1]:

 Job triggered without a valid online node, given where:
apache-beam-jenkins-10

Did anyone do modifications to labels by chance?

[1] https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-10/

Re: Inventory job stuck on nodes 10-16, disk filling up

Posted by Valentyn Tymofieiev <va...@google.com>.
Filed https://issues.apache.org/jira/browse/BEAM-13666 and updated the
instructions for Jenkins upgrades at
https://cwiki.apache.org/confluence/display/BEAM/Jenkins+Tips

On Fri, Jan 14, 2022 at 11:58 AM Valentyn Tymofieiev <va...@google.com>
wrote:

> I think I know what happened. I briefly marked made these nodes offline
> around Jan 5 as I was preparing to an image upgrade.
>
> While a worker was offline, Inventory job started. Inventory job was not
> able to proceed because it needed a specific worker to be available, as
> each inventory job maps to a specific worker. After the worker was running
> again, the new inventory job run didn't trigger, because previous run was
> started but unable to proceed.
>
> The right thing to do would be to cancel inventory job runs that are
> unable to proceed. As a workaround, when we take a node offline and then
> online again, we should cancel any pending inventory jobs that were not
> started. I did that, and the inventory jobs are running now.
>
>
> On Fri, Jan 14, 2022 at 11:46 AM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
>> It appears that inventory job has not been run on Jenkins nodes 10-16,
>>
>> The  logs say [1]:
>>
>>  Job triggered without a valid online node, given where:
>> apache-beam-jenkins-10
>>
>> Did anyone do modifications to labels by chance?
>>
>> [1] https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-10/
>>
>

Re: Inventory job stuck on nodes 10-16, disk filling up

Posted by Valentyn Tymofieiev <va...@google.com>.
I think I know what happened. I briefly marked made these nodes offline
around Jan 5 as I was preparing to an image upgrade.

While a worker was offline, Inventory job started. Inventory job was not
able to proceed because it needed a specific worker to be available, as
each inventory job maps to a specific worker. After the worker was running
again, the new inventory job run didn't trigger, because previous run was
started but unable to proceed.

The right thing to do would be to cancel inventory job runs that are unable
to proceed. As a workaround, when we take a node offline and then online
again, we should cancel any pending inventory jobs that were not started. I
did that, and the inventory jobs are running now.


On Fri, Jan 14, 2022 at 11:46 AM Valentyn Tymofieiev <va...@google.com>
wrote:

> It appears that inventory job has not been run on Jenkins nodes 10-16,
>
> The  logs say [1]:
>
>  Job triggered without a valid online node, given where:
> apache-beam-jenkins-10
>
> Did anyone do modifications to labels by chance?
>
> [1] https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-10/
>