You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tapestry.apache.org by Barry Books <tr...@gmail.com> on 2017/07/10 11:28:52 UTC

Re: tapestry background jobs in a clustered environment

While it does create a separate process I run my batch jobs with Jenkins.
The jobs are just pages so you don't have weird lifecycle problems and they
are easy to test because you can just go to the page. Your load balancer
will distribute the load, you get history and you don't need to write any
code. Just drop the war file into your server.

On Thu, Jun 29, 2017 at 1:18 AM, Ilya Obshadko <il...@gmail.com>
wrote:

> Yes, my case is about the same logical queue. I currently don’t have any
> abstraction for “units of work” or whatever you may call it - application
> didn’t need it when it was running on a single machine. I’ll have to
> introduce something similar to isolate jobs from actual business objects.
>
>
> On Wed, Jun 28, 2017 at 11:24 PM, Dmitry Gusev <dm...@gmail.com>
> wrote:
>
> > Not sure I understand where those pessimistic locks came from.
> >
> > In out case there's no locking at all, every machine in a cluster
> processes
> > jobs simultaneously, unless, of course, the jobs are not from the same
> > logical queue and must be executed in order.
> >
> > By row-level locking I mean PostgreSQL's SELECT ... FOR UPDATE, i.e.:
> >
> > UPDATE units_of_work
> > SET started_at = ?
> > WHERE id = (SELECT id
> >             FROM units_of_work
> >             WHERE started_at IS NULL
> >             LIMIT 1
> >             FOR UPDATE)
> > RETURNING id
> >
> > This is a simplified version of what's actually happening, but
> illustrates
> > the idea: different coordinators don't lock each other.
> >
> >
> > On Wed, Jun 28, 2017 at 11:05 PM, Ilya Obshadko <ilya.obshadko@gmail.com
> >
> > wrote:
> >
> > > I was actually looking at Spring Batch (and a couple of other
> > solutions). I
> > > don’t think Spring Batch could be of much help here.
> > >
> > > My conclusion is similar to what you are saying - implementing
> > lightweight
> > > job coordinator is much easier.
> > >
> > > Row-level locking works well when you are dealing with a simple queue
> > table
> > > - you do a pessimistic lock on N rows, process them and give a chance
> to
> > > another host in the cluster. Unfortunately only one of my background
> jobs
> > > is suitable for this type of refactoring.
> > >
> > > Other jobs process records that shouldn’t be locked for a considerable
> > > amount of time.
> > >
> > > So currently I’m thinking of the following scenario:
> > >
> > > - pass deployment ID via environment to all containers (ECS can do this
> > > quite easily)
> > > - use a simple table with records containing job name, current cluster
> > > deployment ID and state
> > > - first background executor that is able to lock an appropriate job row
> > > starts working, the other(s) are cancelled
> > >
> > >
> > >
> > > On Tue, Jun 27, 2017 at 10:16 PM, Dmitry Gusev <dmitry.gusev@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Ilya,
> > > >
> > > > If you have Spring in your classpath you may look at Spring Batch.
> > > >
> > > > For our projects we've built something similar -- a custom jobs
> > framework
> > > > on top of PostgreSQL.
> > > >
> > > > The idea is that there a coordinator service (Tapestry service) that
> > runs
> > > > in a thread pool and constantly polls special DB tables for new
> > records.
> > > > For every new unit of work it creates instance of a worker (using
> > > > `ObjectLocator.autobuild()`) that's capable of processing the job.
> > > >
> > > > The polling can be optimised well for performance using row-level
> > locks &
> > > > DB indexing.
> > > >
> > > > Coordinator runs in the same JVM as the rest of the app so there's no
> > > > dedicated process.
> > > > It integrates with tapestry's EntityManager so that you could create
> a
> > > job
> > > > in transaction.
> > > >
> > > > When running in a cluster every JVM has its own coordinator -- this
> it
> > > how
> > > > the jobs get distributed.
> > > >
> > > > But you're saying that row-level locking doesn't work for some of
> your
> > > > use-cases, can you be more concrete here?
> > > >
> > > >
> > > > On Tue, Jun 27, 2017 at 9:35 PM, Ilya Obshadko <
> > ilya.obshadko@gmail.com>
> > > > wrote:
> > > >
> > > > > I’ve recently expanded my Tapestry application to run multiple
> hosts.
> > > > While
> > > > > it’s quite OK for the web-faced part (sticky load balancer does
> most
> > of
> > > > the
> > > > > job), it’s not very straightforward with background jobs.
> > > > >
> > > > > Some of them can be quite easily distributed using database
> row-level
> > > > > locks, but this doesn’t work for every use case I have.
> > > > >
> > > > > Are there any suggestions about this? I’d prefer not to have a
> > > dedicated
> > > > > process running background tasks. Ideally, I want to dynamically
> > > > distribute
> > > > > background jobs between hosts in cluster, based on current load
> > status.
> > > > >
> > > > >
> > > > > --
> > > > > Ilya Obshadko
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Dmitry Gusev
> > > >
> > > > AnjLab Team
> > > > http://anjlab.com
> > > >
> > >
> > >
> > >
> > > --
> > > Ilya Obshadko
> > >
> >
> >
> >
> > --
> > Dmitry Gusev
> >
> > AnjLab Team
> > http://anjlab.com
> >
>
>
>
> --
> Ilya Obshadko
>

Re: tapestry background jobs in a clustered environment

Posted by Dmitry Gusev <dm...@gmail.com>.
Hey Barry,

you're talking about scheduled jobs. With scheduled jobs external trigger
may work well.

In our apps we're using embedded Quartz and anjlab-tapestry-quartz [1] for
scheduled jobs to do the same without external process like Jenkins. We've
implemented simple REST interface to get quartz runtime and trigger the
jobs, and a simple admin UI for that -- works good enough for our
use-cases. This is better than standalone Jenkins in some aspects, i.e. you
can configure your scheduled jobs from Java (using Tapestry IoC's
distributed configuration) and don't have to maintain Jenkins's home folder
separately.

But as I understood, Ilya was asking about messaging systems, where jobs
triggered not only by schedule, but by user actions or other events. In
this case common approach is to have a queue where you put your jobs and
have a set of workers that process these jobs somehow. Quartz or Jenkins
can be just another source/trigger for those jobs, i.e. by Jenkins trigger
in your page you create a job instance and send it to a queue where it gets
processed later.

[1]
https://github.com/anjlab/anjlab-tapestry-commons/tree/master/anjlab-tapestry-quartz

On Mon, Jul 10, 2017 at 2:28 PM, Barry Books <tr...@gmail.com> wrote:

> While it does create a separate process I run my batch jobs with Jenkins.
> The jobs are just pages so you don't have weird lifecycle problems and they
> are easy to test because you can just go to the page. Your load balancer
> will distribute the load, you get history and you don't need to write any
> code. Just drop the war file into your server.
>
> On Thu, Jun 29, 2017 at 1:18 AM, Ilya Obshadko <il...@gmail.com>
> wrote:
>
> > Yes, my case is about the same logical queue. I currently don’t have any
> > abstraction for “units of work” or whatever you may call it - application
> > didn’t need it when it was running on a single machine. I’ll have to
> > introduce something similar to isolate jobs from actual business objects.
> >
> >
> > On Wed, Jun 28, 2017 at 11:24 PM, Dmitry Gusev <dm...@gmail.com>
> > wrote:
> >
> > > Not sure I understand where those pessimistic locks came from.
> > >
> > > In out case there's no locking at all, every machine in a cluster
> > processes
> > > jobs simultaneously, unless, of course, the jobs are not from the same
> > > logical queue and must be executed in order.
> > >
> > > By row-level locking I mean PostgreSQL's SELECT ... FOR UPDATE, i.e.:
> > >
> > > UPDATE units_of_work
> > > SET started_at = ?
> > > WHERE id = (SELECT id
> > >             FROM units_of_work
> > >             WHERE started_at IS NULL
> > >             LIMIT 1
> > >             FOR UPDATE)
> > > RETURNING id
> > >
> > > This is a simplified version of what's actually happening, but
> > illustrates
> > > the idea: different coordinators don't lock each other.
> > >
> > >
> > > On Wed, Jun 28, 2017 at 11:05 PM, Ilya Obshadko <
> ilya.obshadko@gmail.com
> > >
> > > wrote:
> > >
> > > > I was actually looking at Spring Batch (and a couple of other
> > > solutions). I
> > > > don’t think Spring Batch could be of much help here.
> > > >
> > > > My conclusion is similar to what you are saying - implementing
> > > lightweight
> > > > job coordinator is much easier.
> > > >
> > > > Row-level locking works well when you are dealing with a simple queue
> > > table
> > > > - you do a pessimistic lock on N rows, process them and give a chance
> > to
> > > > another host in the cluster. Unfortunately only one of my background
> > jobs
> > > > is suitable for this type of refactoring.
> > > >
> > > > Other jobs process records that shouldn’t be locked for a
> considerable
> > > > amount of time.
> > > >
> > > > So currently I’m thinking of the following scenario:
> > > >
> > > > - pass deployment ID via environment to all containers (ECS can do
> this
> > > > quite easily)
> > > > - use a simple table with records containing job name, current
> cluster
> > > > deployment ID and state
> > > > - first background executor that is able to lock an appropriate job
> row
> > > > starts working, the other(s) are cancelled
> > > >
> > > >
> > > >
> > > > On Tue, Jun 27, 2017 at 10:16 PM, Dmitry Gusev <
> dmitry.gusev@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Ilya,
> > > > >
> > > > > If you have Spring in your classpath you may look at Spring Batch.
> > > > >
> > > > > For our projects we've built something similar -- a custom jobs
> > > framework
> > > > > on top of PostgreSQL.
> > > > >
> > > > > The idea is that there a coordinator service (Tapestry service)
> that
> > > runs
> > > > > in a thread pool and constantly polls special DB tables for new
> > > records.
> > > > > For every new unit of work it creates instance of a worker (using
> > > > > `ObjectLocator.autobuild()`) that's capable of processing the job.
> > > > >
> > > > > The polling can be optimised well for performance using row-level
> > > locks &
> > > > > DB indexing.
> > > > >
> > > > > Coordinator runs in the same JVM as the rest of the app so there's
> no
> > > > > dedicated process.
> > > > > It integrates with tapestry's EntityManager so that you could
> create
> > a
> > > > job
> > > > > in transaction.
> > > > >
> > > > > When running in a cluster every JVM has its own coordinator -- this
> > it
> > > > how
> > > > > the jobs get distributed.
> > > > >
> > > > > But you're saying that row-level locking doesn't work for some of
> > your
> > > > > use-cases, can you be more concrete here?
> > > > >
> > > > >
> > > > > On Tue, Jun 27, 2017 at 9:35 PM, Ilya Obshadko <
> > > ilya.obshadko@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I’ve recently expanded my Tapestry application to run multiple
> > hosts.
> > > > > While
> > > > > > it’s quite OK for the web-faced part (sticky load balancer does
> > most
> > > of
> > > > > the
> > > > > > job), it’s not very straightforward with background jobs.
> > > > > >
> > > > > > Some of them can be quite easily distributed using database
> > row-level
> > > > > > locks, but this doesn’t work for every use case I have.
> > > > > >
> > > > > > Are there any suggestions about this? I’d prefer not to have a
> > > > dedicated
> > > > > > process running background tasks. Ideally, I want to dynamically
> > > > > distribute
> > > > > > background jobs between hosts in cluster, based on current load
> > > status.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ilya Obshadko
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Dmitry Gusev
> > > > >
> > > > > AnjLab Team
> > > > > http://anjlab.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Ilya Obshadko
> > > >
> > >
> > >
> > >
> > > --
> > > Dmitry Gusev
> > >
> > > AnjLab Team
> > > http://anjlab.com
> > >
> >
> >
> >
> > --
> > Ilya Obshadko
> >
>



-- 
Dmitry Gusev

AnjLab Team
http://anjlab.com