You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ofbiz.apache.org by Giulio Speri - MpStyle Srl <gi...@mpstyle.it> on 2022/01/20 19:27:53 UTC

JobPoller strange behavior with multiple ofbiz instances in Docker Containers

Hello everyone,

I hope you are all well and healthy.
I am writing because we are facing a weird behavior of the JobPoller in a
Docker environment.
I list here env execution data:
- ofbiz version is 13.07.03
- Ubuntu 18 server
- Docker container server
- Multiple docker containers running: 1 container -> 1 customer -> 1 full
ofbiz instance
- Multi-tenant enabled : 1 container -> 1 tenant
- 1 container running MySQL Server (shared by all the ofbiz containers)
- 1 container with Apache Web Server, acting as a proxy
- each container has its own volumes to persist data and one of the files
is *general.properties*
- each ofbiz-container is set up with its own and specific unique.instanceId

*Problem context*
Each ofbiz instance has some scheduled services that run at some time
mostly for external integration with ERP systems.
We have in particular one job that reads shipped sales order data and
creates a csv file in a position inside the ofbiz-home directory: this
location is kept as a volume to let an external program, to come and read
the generated file from the "physical" server.
The problem is that often, we can see that the service runs with no
problems, order header processed records are marked/flagged as processed,
but the file is not generated inside the container I expect to see it, but
in another container.
This leads to a customer that has registered on his ERP systems, orders
data of another customer; no good.

I know that when there are multiple ofbiz instances running, the unique id
is crucial to keep the things going so I double-checked them and I can
confirm that each instance has its unique id.

What I did, then is to give a look at the JOB_SANDBOX entity of each tenant
and here I noticed that services, when executed from the JobPoller, have an
instance id, different from the one of "its container", and that means that
the job is executed by the "wrong" container.
I also add that the JobPoller does not always pick up the same "wrong"
uniqueId, but it often differs from job execution to job execution.

This happens only when the service is run by the JobManager; if I execute
the same service by hand on the proper container all is good.

To summarize, it seems that when the JobManager reads the instanceId by the
general.properties file, it picks the id belonging to another (random)
container: if we are lucky it picks up its own unique id making the service
run correctly.

Does anyone have ever experienced this or a similar issue?

Thank you very much in advance,

Giulio

-- 
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 347/0965506

www.mpstyle.it

Re: JobPoller strange behavior with multiple ofbiz instances in Docker Containers

Posted by Shi Jinghai <hu...@hotmail.com>.
Hi Giulio,

Interesting, are you want to set JobPoller a role for each instance? If yes, my +1.

And +1 to make a queue and some snowflake like support/implements OOTB.

Good Luck,

Shi Jinghai


从 Windows 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>发送

________________________________
发件人: Giulio Speri - MpStyle Srl <gi...@mpstyle.it>
发送时间: Saturday, February 12, 2022 7:49:04 AM
收件人: Dev list <de...@ofbiz.apache.org>
主题: Re: JobPoller strange behavior with multiple ofbiz instances in Docker Containers

Hello Devs,

I'll reach you with an update on this.
Me and my colleagues (@Nicola Mazzoni is in this ML) have found the issue
and a possible solution that we applied almost a week ago in our production
environment.
We found no issue or bug in OFBiz JobPoller and JobManager code, but it was
more a "timing" problem.

I explain: each container started an instance of OFbiz, each instance also
started its JobPoller thread with all the JobManagers associated with a
specific tenant.
All the instances had the poll-enable parameter set to true and all run
services from the pool "pool", and since all has access to the same MySQL
database, what happened was that the "first" (in terms of time) thread
between all the containers that polled the JobSandbox entity (of that
tenant), run the service.
This explains why we saw jobs always executed by different *ofbiz_instanceIds,
*and why output files were written in a container different from the one we
would expect.

We've been able to prove this behaviour, proceeding step-by-step:
1 - at first we disabled (poll-enabled=false) all the JobPoller thread in
all the containers, except one; this was the simple case and JobPoller
behaves as expected: service and output file were executed and written to
the correct container.
2 - put some debug prints into the JobManager poll() method outputting
instanceId - service name and timing of execution;
3 - rebuilt container images and enabled a second JobPoller thread in
another instance;
4 - scheduled the same service in the tenants related to each
container/ofbiz instance, repeated for 5/10 times every 3 minutes (just to
be sure to catch something);
5 - observed the live tail output of the ofbiz log files on both container
consoles;
6 - visually check and confirm that the thread which woke up earlier (but
after job schedule time) did execute the job, and that happened regardless
of the container we scheduled the job from;
7 - repeated this test re-enabling all the JobPollers one after another,
with the same result as step 6.

In short:
Container A - tenant A - ofbiz_instanceId = 1 - JobPoller-A - pool="pool"
Container B - tenant B - ofbiz_instanceId = 2 - JobPoller-B - pool="pool"

Service "MY_SERVICE" scheduled in container/tenant A at time X.
Service "MY_SERVICE" scheduled in container/tenant B at time Y.

JobPoller-A, that has all the tenant managers registered, polls after
schedule time Y of MY_SERVICE for tenant B, but before JobPoller-B: then
MY_SERVICE of tenant B is executed by ofbiz_instanceId = 1.

Once we were sure about this behavior we could think of a solution and we
came up with two possibilities.

*[1]* Start a new ofbiz container with the JobPoller enabled and in the
meanwhile disable all the other JobPoller threads, so that only one poller
thread between all containers would be active.

*[2]* JobManager search/schedule/queue/run jobs looking also at the
thread-pool of execution: giving each container a different
<thread-pool>..</thread-pool> configuration for parameters *send-to-pool*
and *run-from-pool* (ie for Container A: send-to-pool="poolA" and
run-from-pool name="poolA", and so on for Container B,C,D,...), will cause
each JobPoller run services only of its specific pool name, so that if I
schedule a service in the Container A, only the JobPoller in that container
will run the job, since the JobPoller in the container B, will search in
tenants for jobs with pool="poolB".

We decided to proceed with the solution [2], because we think that it's
correct in this scenario to have each container run its own (per tenant)
jobs, in order to keep each container/customer logically separated from the
others.
The last point done to complete this configuration was to update JobSandbox
entities and assign to already scheduled services the proper and specific
thread execution pool.

The checks we've done throughout the week had successful results, but we
will keep monitoring the situation still for some days.

I don't know if other devs/contributors/users are using ofbiz in docker
containers in the same way as we do, but this problem gave us more than a
headache for a lot of time, so I hope that our experience could be helpful
in saving some precious time to anyone will face this kind of situation.

Have a great weekend ahead.
Kind Regards,
Giulio









Il giorno gio 20 gen 2022 alle ore 20:27 Giulio Speri - MpStyle Srl <
giulio.speri@mpstyle.it> ha scritto:

> Hello everyone,
>
> I hope you are all well and healthy.
> I am writing because we are facing a weird behavior of the JobPoller in a
> Docker environment.
> I list here env execution data:
> - ofbiz version is 13.07.03
> - Ubuntu 18 server
> - Docker container server
> - Multiple docker containers running: 1 container -> 1 customer -> 1 full
> ofbiz instance
> - Multi-tenant enabled : 1 container -> 1 tenant
> - 1 container running MySQL Server (shared by all the ofbiz containers)
> - 1 container with Apache Web Server, acting as a proxy
> - each container has its own volumes to persist data and one of the files
> is *general.properties*
> - each ofbiz-container is set up with its own and specific
> unique.instanceId
>
> *Problem context*
> Each ofbiz instance has some scheduled services that run at some time
> mostly for external integration with ERP systems.
> We have in particular one job that reads shipped sales order data and
> creates a csv file in a position inside the ofbiz-home directory: this
> location is kept as a volume to let an external program, to come and read
> the generated file from the "physical" server.
> The problem is that often, we can see that the service runs with no
> problems, order header processed records are marked/flagged as processed,
> but the file is not generated inside the container I expect to see it, but
> in another container.
> This leads to a customer that has registered on his ERP systems, orders
> data of another customer; no good.
>
> I know that when there are multiple ofbiz instances running, the unique id
> is crucial to keep the things going so I double-checked them and I can
> confirm that each instance has its unique id.
>
> What I did, then is to give a look at the JOB_SANDBOX entity of each
> tenant and here I noticed that services, when executed from the JobPoller,
> have an instance id, different from the one of "its container", and that
> means that the job is executed by the "wrong" container.
> I also add that the JobPoller does not always pick up the same "wrong"
> uniqueId, but it often differs from job execution to job execution.
>
> This happens only when the service is run by the JobManager; if I execute
> the same service by hand on the proper container all is good.
>
> To summarize, it seems that when the JobManager reads the instanceId by
> the general.properties file, it picks the id belonging to another (random)
> container: if we are lucky it picks up its own unique id making the service
> run correctly.
>
> Does anyone have ever experienced this or a similar issue?
>
> Thank you very much in advance,
>
> Giulio
>
> --
> Giulio Speri
>
>
> *Mp Styl**e Srl*
> via Antonio Meucci, 37
> 41019 Limidi di Soliera (MO)
> T 059/684916
> M 347/0965506
>
> www.mpstyle.it<http://www.mpstyle.it>
>
>
>

--
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 347/0965506

www.mpstyle.it<http://www.mpstyle.it>

Re: JobPoller strange behavior with multiple ofbiz instances in Docker Containers

Posted by Giulio Speri - MpStyle Srl <gi...@mpstyle.it>.
Hello Devs,

I'll reach you with an update on this.
Me and my colleagues (@Nicola Mazzoni is in this ML) have found the issue
and a possible solution that we applied almost a week ago in our production
environment.
We found no issue or bug in OFBiz JobPoller and JobManager code, but it was
more a "timing" problem.

I explain: each container started an instance of OFbiz, each instance also
started its JobPoller thread with all the JobManagers associated with a
specific tenant.
All the instances had the poll-enable parameter set to true and all run
services from the pool "pool", and since all has access to the same MySQL
database, what happened was that the "first" (in terms of time) thread
between all the containers that polled the JobSandbox entity (of that
tenant), run the service.
This explains why we saw jobs always executed by different *ofbiz_instanceIds,
*and why output files were written in a container different from the one we
would expect.

We've been able to prove this behaviour, proceeding step-by-step:
1 - at first we disabled (poll-enabled=false) all the JobPoller thread in
all the containers, except one; this was the simple case and JobPoller
behaves as expected: service and output file were executed and written to
the correct container.
2 - put some debug prints into the JobManager poll() method outputting
instanceId - service name and timing of execution;
3 - rebuilt container images and enabled a second JobPoller thread in
another instance;
4 - scheduled the same service in the tenants related to each
container/ofbiz instance, repeated for 5/10 times every 3 minutes (just to
be sure to catch something);
5 - observed the live tail output of the ofbiz log files on both container
consoles;
6 - visually check and confirm that the thread which woke up earlier (but
after job schedule time) did execute the job, and that happened regardless
of the container we scheduled the job from;
7 - repeated this test re-enabling all the JobPollers one after another,
with the same result as step 6.

In short:
Container A - tenant A - ofbiz_instanceId = 1 - JobPoller-A - pool="pool"
Container B - tenant B - ofbiz_instanceId = 2 - JobPoller-B - pool="pool"

Service "MY_SERVICE" scheduled in container/tenant A at time X.
Service "MY_SERVICE" scheduled in container/tenant B at time Y.

JobPoller-A, that has all the tenant managers registered, polls after
schedule time Y of MY_SERVICE for tenant B, but before JobPoller-B: then
MY_SERVICE of tenant B is executed by ofbiz_instanceId = 1.

Once we were sure about this behavior we could think of a solution and we
came up with two possibilities.

*[1]* Start a new ofbiz container with the JobPoller enabled and in the
meanwhile disable all the other JobPoller threads, so that only one poller
thread between all containers would be active.

*[2]* JobManager search/schedule/queue/run jobs looking also at the
thread-pool of execution: giving each container a different
<thread-pool>..</thread-pool> configuration for parameters *send-to-pool*
and *run-from-pool* (ie for Container A: send-to-pool="poolA" and
run-from-pool name="poolA", and so on for Container B,C,D,...), will cause
each JobPoller run services only of its specific pool name, so that if I
schedule a service in the Container A, only the JobPoller in that container
will run the job, since the JobPoller in the container B, will search in
tenants for jobs with pool="poolB".

We decided to proceed with the solution [2], because we think that it's
correct in this scenario to have each container run its own (per tenant)
jobs, in order to keep each container/customer logically separated from the
others.
The last point done to complete this configuration was to update JobSandbox
entities and assign to already scheduled services the proper and specific
thread execution pool.

The checks we've done throughout the week had successful results, but we
will keep monitoring the situation still for some days.

I don't know if other devs/contributors/users are using ofbiz in docker
containers in the same way as we do, but this problem gave us more than a
headache for a lot of time, so I hope that our experience could be helpful
in saving some precious time to anyone will face this kind of situation.

Have a great weekend ahead.
Kind Regards,
Giulio









Il giorno gio 20 gen 2022 alle ore 20:27 Giulio Speri - MpStyle Srl <
giulio.speri@mpstyle.it> ha scritto:

> Hello everyone,
>
> I hope you are all well and healthy.
> I am writing because we are facing a weird behavior of the JobPoller in a
> Docker environment.
> I list here env execution data:
> - ofbiz version is 13.07.03
> - Ubuntu 18 server
> - Docker container server
> - Multiple docker containers running: 1 container -> 1 customer -> 1 full
> ofbiz instance
> - Multi-tenant enabled : 1 container -> 1 tenant
> - 1 container running MySQL Server (shared by all the ofbiz containers)
> - 1 container with Apache Web Server, acting as a proxy
> - each container has its own volumes to persist data and one of the files
> is *general.properties*
> - each ofbiz-container is set up with its own and specific
> unique.instanceId
>
> *Problem context*
> Each ofbiz instance has some scheduled services that run at some time
> mostly for external integration with ERP systems.
> We have in particular one job that reads shipped sales order data and
> creates a csv file in a position inside the ofbiz-home directory: this
> location is kept as a volume to let an external program, to come and read
> the generated file from the "physical" server.
> The problem is that often, we can see that the service runs with no
> problems, order header processed records are marked/flagged as processed,
> but the file is not generated inside the container I expect to see it, but
> in another container.
> This leads to a customer that has registered on his ERP systems, orders
> data of another customer; no good.
>
> I know that when there are multiple ofbiz instances running, the unique id
> is crucial to keep the things going so I double-checked them and I can
> confirm that each instance has its unique id.
>
> What I did, then is to give a look at the JOB_SANDBOX entity of each
> tenant and here I noticed that services, when executed from the JobPoller,
> have an instance id, different from the one of "its container", and that
> means that the job is executed by the "wrong" container.
> I also add that the JobPoller does not always pick up the same "wrong"
> uniqueId, but it often differs from job execution to job execution.
>
> This happens only when the service is run by the JobManager; if I execute
> the same service by hand on the proper container all is good.
>
> To summarize, it seems that when the JobManager reads the instanceId by
> the general.properties file, it picks the id belonging to another (random)
> container: if we are lucky it picks up its own unique id making the service
> run correctly.
>
> Does anyone have ever experienced this or a similar issue?
>
> Thank you very much in advance,
>
> Giulio
>
> --
> Giulio Speri
>
>
> *Mp Styl**e Srl*
> via Antonio Meucci, 37
> 41019 Limidi di Soliera (MO)
> T 059/684916
> M 347/0965506
>
> www.mpstyle.it
>
>
>

-- 
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 347/0965506

www.mpstyle.it