You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by "Cahill, Earl" <ec...@corp.untd.com> on 2005/05/23 23:27:40 UTC

queue system using mod_perl2/apache2

I am wondering a couple things

 

First, are there any good, open source, perl-based queue systems out there?
I think I would like a stable, centralized daemon, which likely rules out
many systems.

 

Second, is anyone interesting in working on a queue system with
mod_perl2/apache2 as it's basis, using the mod_echo, arbitrary protocol
stuff?

 

Basically, you would register servers and jobs with the daemon, and then the
daemon would give jobs out, perhaps wait for them to finish and then return
appropriate output.  The system would also handle job dependencies.

I would like to be pretty open, and would hope to easily handle some basic
job types, like 

 

1.	for each row returned from a db query, do something with it
2.	watch a filesystem for changes to a directory or file and act
accordingly
3.	be fast enough to serve web hits through it, like maybe 1/100 second
system overhead
4.	have a flat file and do something for each line in it
5.	be able to rather arbitrarily add little jobs, as in a web crawler
running in parallel

 

I think using mod_perl2/apache2 would be ideal.  Could use apache2 to give
jobs to non-mod_perl processes if desired.  I think that if apache2 could
manage forking, children and the like that would save many headaches.

 

I think I even have a name, PDQ, which could stand for Perl Dynamic Queue,
or yeah, Pretty Darn Quick.

 

Feed back welcome.

 

Thanks,

Earl


Re: queue system using mod_perl2/apache2

Posted by Perrin Harkins <pe...@elem.com>.
On Monday 23 May 2005 5:27 pm, Cahill, Earl wrote:
> First, are there any good, open source, perl-based queue systems out there?

The closest is Spread::Queue.

> Second, is anyone interesting in working on a queue system with
> mod_perl2/apache2 as it's basis, using the mod_echo, arbitrary protocol
> stuff?

I'm going to be building one this month for a project Plus Three is working 
on, and was hoping to release it as Apache::Queue or something.

I thought about using the protocol stuff, but I can't think of any good reason 
not to use HTTP.  As a result, I would probably write this to work with 
mod_perl 1 first (because that's what the rest of the system is on) and 
eventually support both generations.

The design is not fully baked yet, and it's still possible that I may just use 
a database and polling cron jobs instead of having an HTTP daemon, but the 
general idea goes like this:

- Client sends an HTTP request asking for a job to be added to the queue.
- Server adds the job to a database, sends an OK to the client, and 
disconnects.
- The child process that answered the request is now awake, so it checks to 
see if there any jobs in the queue which no process has accepted yet, and 
takes one if there is.
- Child process continues to do this in a loop until there are no more waiting 
jobs in the database.
- Client can connect again at any time and ask for the status of the job and 
the result.

The major wrinkle here is making sure there are always processes listening for 
requests and not handling jobs, so that clients can keep queuing jobs and 
getting results even if there's a backlog.  To do that, we need to be able to 
check the state of the other processes, maybe with Apache::Scoreboard, or 
something similar.

> Basically, you would register servers and jobs with the daemon, and then
> the daemon would give jobs out, perhaps wait for them to finish and then
> return appropriate output.  The system would also handle job dependencies.

I was planning to keep the actual job processing part very separate from the 
queue system, probably using a dispatch table that can be configured to pass 
job types to handler classes.

> 1.	for each row returned from a db query, do something with it
> 2.	watch a filesystem for changes to a directory or file and act
> accordingly
> 3.	be fast enough to serve web hits through it, like maybe 1/100 second
> system overhead
> 4.	have a flat file and do something for each line in it
> 5.	be able to rather arbitrarily add little jobs, as in a web crawler
> running in parallel

Built-in job types like this aren't part of my plan.

> I think using mod_perl2/apache2 would be ideal.  Could use apache2 to give
> jobs to non-mod_perl processes if desired.  I think that if apache2 could
> manage forking, children and the like that would save many headaches.

I don't see anything about this that specifically needs Apache 2 features, 
personally.

- Perrin