You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ofbiz.apache.org by Daniel Watford <da...@foomoo.co.uk> on 2023/02/11 08:34:35 UTC

Demo sites not response - suspect low RAM following docker container deployment

Hello,

After checking Saturday morning, it looks like the demo sites are not
responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org,
the demo sites host.

This follows some experiments on Friday to deploy a docker container
version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment with
deploying OFBiz containers to the demo sites server - ASF JIRA (apache.org)
<https://issues.apache.org/jira/browse/OFBIZ-12757>)

On Friday we had a similar lock-up on the host which was thought to be due
to running out of memory on the VM.

I suspect that, with the nightly rebuild of trunk, stable and next, these
processes may cause a higher memory load on the server compared to normal
running, and this may have caused us to exhaust available RAM on the server
again.

If this suspicion is correct, then there is not enough RAM on the host to
support docker experimentation in addition to normal operation of the sites.

A request has been made to INFRA for a RAM increase which has been accepted
([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF JIRA
<https://issues.apache.org/jira/browse/INFRA-24185>). Depending on INFRA
availability, hopefully this will be completed early next week.

What can we do now?
When we had the previous lock-up on Friday, the demo host did eventually
recover and start accepting browser requests and SSH connections. I imagine
the OFBiz build process is still ongoing - and would have been running for
5.5 hours at time of writing this message.

If the host recovers and I am available, I will SSH to the host and disable
the docker containers, hopefully avoiding this issue over the next few days
until the RAM uplift is applied.

Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply the
RAM lift at the same time.

Dan.

-- 
Daniel Watford

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Jacques Le Roux <ja...@les7arts.com>.
Hi Daniel,

I have told Infra at INFRA-24185 that I still can connect after the increase to AWS t2.2xlarge and asked for a reboot.

Maybe some more time is expected. I guess that if we get at least 12 GB it should be OK.

I'll send a simple message to the user ML, no worries there is much less activity during the weekend ;)

Jacques

Le 11/02/2023 à 09:34, Daniel Watford a écrit :
> Hello,
>
> After checking Saturday morning, it looks like the demo sites are not
> responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org,
> the demo sites host.
>
> This follows some experiments on Friday to deploy a docker container
> version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment with
> deploying OFBiz containers to the demo sites server - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/OFBIZ-12757>)
>
> On Friday we had a similar lock-up on the host which was thought to be due
> to running out of memory on the VM.
>
> I suspect that, with the nightly rebuild of trunk, stable and next, these
> processes may cause a higher memory load on the server compared to normal
> running, and this may have caused us to exhaust available RAM on the server
> again.
>
> If this suspicion is correct, then there is not enough RAM on the host to
> support docker experimentation in addition to normal operation of the sites.
>
> A request has been made to INFRA for a RAM increase which has been accepted
> ([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF JIRA
> <https://issues.apache.org/jira/browse/INFRA-24185>). Depending on INFRA
> availability, hopefully this will be completed early next week.
>
> What can we do now?
> When we had the previous lock-up on Friday, the demo host did eventually
> recover and start accepting browser requests and SSH connections. I imagine
> the OFBiz build process is still ongoing - and would have been running for
> 5.5 hours at time of writing this message.
>
> If the host recovers and I am available, I will SSH to the host and disable
> the docker containers, hopefully avoiding this issue over the next few days
> until the RAM uplift is applied.
>
> Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply the
> RAM lift at the same time.
>
> Dan.
>

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Daniel Watford <da...@foomoo.co.uk>.
Hi Michael,

On Sat, 11 Feb 2023 at 09:43, Michael Brohl <mi...@ecomify.de>
wrote:

> Hi Dan, all,
>
> maybe we can find a way to experiment on this outside the productive
> VM's for our demos?
>
>
The alternative to working on ofbiz-vm1.apache.org would have been to work
on our own hosts, or get a second host from Apache. I favoured an Apache
host as it meant others could collaborate more easily, but I believe
justifying a second host would have taken time and may not have been
approved.

I think our current approach was reasonably balanced given the risk to the
demo sites vs benefit of experimenting with new approaches, (perceived)
impact on users and how rapidly we addressed issues as they were discovered.

We hit this RAM issue twice. The first time it occured we recovered quickly
and implemented some mitigations to constrain resource usage by the
containers.

What we didn't think of was that RAM usage would increase again during the
nightly build, making the VM unresponsive again. However in response to the
first occurrence we already had a request out to INFRA for a RAM uplift
which was already approved and was just waiting for a suitable time for
application.

(Note: I am still assuming the issue was lack of memory. We don't have any
metrics yet to confirm this)

I don't have any figures on how much traffic our demo sites get, but I have
a feeling it is low. Given how quickly we addressed the problems I don't
think there was much impact to users. It would be good to have some metrics
on demo site usage though. It could be something we include in the PMC
report/blog post as an indicator on user engagement with the project.

Dan

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Michael Brohl <mi...@ecomify.de>.
That sounds great, thanks!

Michael


Am 11.02.23 um 11:56 schrieb Daniel Watford:
> 32 GB - Nice! :)
>
> And we have 8 vcpus too - rather than the 2 we started with . A 4 times the
> previous CPU and memory capacity
>
>
> On Sat, 11 Feb 2023 at 10:14, Jacques Le Roux <ja...@les7arts.com>
> wrote:
>
>> Hi Michael,
>>
>> I guess it should be OK now. We got 32GB!
>>
>> The demos are back and should be faster
>>
>> Jacques
>>
>> Le 11/02/2023 à 10:42, Michael Brohl a écrit :
>>> Hi Dan, all,
>>>
>>> maybe we can find a way to experiment on this outside the productive
>> VM's for our demos?
>>> Best regards,
>>>
>>> Michael
>>>
>>>
>>> Am 11.02.23 um 09:34 schrieb Daniel Watford:
>>>> Hello,
>>>>
>>>> After checking Saturday morning, it looks like the demo sites are not
>>>> responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org
>> ,
>>>> the demo sites host.
>>>>
>>>> This follows some experiments on Friday to deploy a docker container
>>>> version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment
>> with
>>>> deploying OFBiz containers to the demo sites server - ASF JIRA (
>> apache.org)
>>>> <https://issues.apache.org/jira/browse/OFBIZ-12757>)
>>>>
>>>> On Friday we had a similar lock-up on the host which was thought to be
>> due
>>>> to running out of memory on the VM.
>>>>
>>>> I suspect that, with the nightly rebuild of trunk, stable and next,
>> these
>>>> processes may cause a higher memory load on the server compared to
>> normal
>>>> running, and this may have caused us to exhaust available RAM on the
>> server
>>>> again.
>>>>
>>>> If this suspicion is correct, then there is not enough RAM on the host
>> to
>>>> support docker experimentation in addition to normal operation of the
>> sites.
>>>> A request has been made to INFRA for a RAM increase which has been
>> accepted
>>>> ([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF
>> JIRA
>>>> <https://issues.apache.org/jira/browse/INFRA-24185>). Depending on
>> INFRA
>>>> availability, hopefully this will be completed early next week.
>>>>
>>>> What can we do now?
>>>> When we had the previous lock-up on Friday, the demo host did eventually
>>>> recover and start accepting browser requests and SSH connections. I
>> imagine
>>>> the OFBiz build process is still ongoing - and would have been running
>> for
>>>> 5.5 hours at time of writing this message.
>>>>
>>>> If the host recovers and I am available, I will SSH to the host and
>> disable
>>>> the docker containers, hopefully avoiding this issue over the next few
>> days
>>>> until the RAM uplift is applied.
>>>>
>>>> Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply
>> the
>>>> RAM lift at the same time.
>>>>
>>>> Dan.
>>>>
>

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Daniel Watford <da...@foomoo.co.uk>.
32 GB - Nice! :)

And we have 8 vcpus too - rather than the 2 we started with . A 4 times the
previous CPU and memory capacity


On Sat, 11 Feb 2023 at 10:14, Jacques Le Roux <ja...@les7arts.com>
wrote:

> Hi Michael,
>
> I guess it should be OK now. We got 32GB!
>
> The demos are back and should be faster
>
> Jacques
>
> Le 11/02/2023 à 10:42, Michael Brohl a écrit :
> > Hi Dan, all,
> >
> > maybe we can find a way to experiment on this outside the productive
> VM's for our demos?
> >
> > Best regards,
> >
> > Michael
> >
> >
> > Am 11.02.23 um 09:34 schrieb Daniel Watford:
> >> Hello,
> >>
> >> After checking Saturday morning, it looks like the demo sites are not
> >> responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org
> ,
> >> the demo sites host.
> >>
> >> This follows some experiments on Friday to deploy a docker container
> >> version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment
> with
> >> deploying OFBiz containers to the demo sites server - ASF JIRA (
> apache.org)
> >> <https://issues.apache.org/jira/browse/OFBIZ-12757>)
> >>
> >> On Friday we had a similar lock-up on the host which was thought to be
> due
> >> to running out of memory on the VM.
> >>
> >> I suspect that, with the nightly rebuild of trunk, stable and next,
> these
> >> processes may cause a higher memory load on the server compared to
> normal
> >> running, and this may have caused us to exhaust available RAM on the
> server
> >> again.
> >>
> >> If this suspicion is correct, then there is not enough RAM on the host
> to
> >> support docker experimentation in addition to normal operation of the
> sites.
> >>
> >> A request has been made to INFRA for a RAM increase which has been
> accepted
> >> ([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF
> JIRA
> >> <https://issues.apache.org/jira/browse/INFRA-24185>). Depending on
> INFRA
> >> availability, hopefully this will be completed early next week.
> >>
> >> What can we do now?
> >> When we had the previous lock-up on Friday, the demo host did eventually
> >> recover and start accepting browser requests and SSH connections. I
> imagine
> >> the OFBiz build process is still ongoing - and would have been running
> for
> >> 5.5 hours at time of writing this message.
> >>
> >> If the host recovers and I am available, I will SSH to the host and
> disable
> >> the docker containers, hopefully avoiding this issue over the next few
> days
> >> until the RAM uplift is applied.
> >>
> >> Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply
> the
> >> RAM lift at the same time.
> >>
> >> Dan.
> >>
>


-- 
Daniel Watford

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Jacques Le Roux <ja...@les7arts.com>.
Hi Michael,

I guess it should be OK now. We got 32GB!

The demos are back and should be faster

Jacques

Le 11/02/2023 à 10:42, Michael Brohl a écrit :
> Hi Dan, all,
>
> maybe we can find a way to experiment on this outside the productive VM's for our demos?
>
> Best regards,
>
> Michael
>
>
> Am 11.02.23 um 09:34 schrieb Daniel Watford:
>> Hello,
>>
>> After checking Saturday morning, it looks like the demo sites are not
>> responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org,
>> the demo sites host.
>>
>> This follows some experiments on Friday to deploy a docker container
>> version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment with
>> deploying OFBiz containers to the demo sites server - ASF JIRA (apache.org)
>> <https://issues.apache.org/jira/browse/OFBIZ-12757>)
>>
>> On Friday we had a similar lock-up on the host which was thought to be due
>> to running out of memory on the VM.
>>
>> I suspect that, with the nightly rebuild of trunk, stable and next, these
>> processes may cause a higher memory load on the server compared to normal
>> running, and this may have caused us to exhaust available RAM on the server
>> again.
>>
>> If this suspicion is correct, then there is not enough RAM on the host to
>> support docker experimentation in addition to normal operation of the sites.
>>
>> A request has been made to INFRA for a RAM increase which has been accepted
>> ([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF JIRA
>> <https://issues.apache.org/jira/browse/INFRA-24185>). Depending on INFRA
>> availability, hopefully this will be completed early next week.
>>
>> What can we do now?
>> When we had the previous lock-up on Friday, the demo host did eventually
>> recover and start accepting browser requests and SSH connections. I imagine
>> the OFBiz build process is still ongoing - and would have been running for
>> 5.5 hours at time of writing this message.
>>
>> If the host recovers and I am available, I will SSH to the host and disable
>> the docker containers, hopefully avoiding this issue over the next few days
>> until the RAM uplift is applied.
>>
>> Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply the
>> RAM lift at the same time.
>>
>> Dan.
>>

Re: Demo sites not response - suspect low RAM following docker container deployment

Posted by Michael Brohl <mi...@ecomify.de>.
Hi Dan, all,

maybe we can find a way to experiment on this outside the productive 
VM's for our demos?

Best regards,

Michael


Am 11.02.23 um 09:34 schrieb Daniel Watford:
> Hello,
>
> After checking Saturday morning, it looks like the demo sites are not
> responding to browser requests and I cannot SSH to ofbiz-vm1.apache.org,
> the demo sites host.
>
> This follows some experiments on Friday to deploy a docker container
> version of OFBiz. (Progress is reported here -[OFBIZ-12757] Experiment with
> deploying OFBiz containers to the demo sites server - ASF JIRA (apache.org)
> <https://issues.apache.org/jira/browse/OFBIZ-12757>)
>
> On Friday we had a similar lock-up on the host which was thought to be due
> to running out of memory on the VM.
>
> I suspect that, with the nightly rebuild of trunk, stable and next, these
> processes may cause a higher memory load on the server compared to normal
> running, and this may have caused us to exhaust available RAM on the server
> again.
>
> If this suspicion is correct, then there is not enough RAM on the host to
> support docker experimentation in addition to normal operation of the sites.
>
> A request has been made to INFRA for a RAM increase which has been accepted
> ([INFRA-24185] Request to increase RAM on ofbiz-vm1.apache.org - ASF JIRA
> <https://issues.apache.org/jira/browse/INFRA-24185>). Depending on INFRA
> availability, hopefully this will be completed early next week.
>
> What can we do now?
> When we had the previous lock-up on Friday, the demo host did eventually
> recover and start accepting browser requests and SSH connections. I imagine
> the OFBiz build process is still ongoing - and would have been running for
> 5.5 hours at time of writing this message.
>
> If the host recovers and I am available, I will SSH to the host and disable
> the docker containers, hopefully avoiding this issue over the next few days
> until the RAM uplift is applied.
>
> Alternatively, if we ask INFRA to reboot the VM, perhaps they can apply the
> RAM lift at the same time.
>
> Dan.
>