You are viewing a plain text version of this content. The canonical link for it is here.

Posted to rivet-dev@tcl.apache.org by Karl Lehenbauer <ka...@flightaware.com> on 2011/01/19 19:18:55 UTC

Rivet wish list

Greetings, Programs!

Here are a couple-three things I think would make a big difference for using Rivet to build and deploy big websites.

  1.  Named separate virtual interpreters.

We use separate virtual interpreters for development (for each developer we diddle auto_path to give them source-controlled private copies of all of our packages) and it's really badass.  The problem comes in the all-or-nothing approach of SVI.  We set up a virtual host for each developer, for both port 80 (well really 8080 + varnish on 80) and 443.  This results in 18 interpreters in each httpd process on our development machine, making for very large httpd processes and slow startup time after a graceful.

If we could name the virtual interpreters, we could cut the number of interpreters from 18 to 9 in this case.

2. Virtual interpreter restarts without a graceful

Right now if one of our developers changes a package, private to them, they still have to do an apachectl graceful to pick up the change.  This restarts all of the httpd processes and reinitializes all of the interpreters.  Our interpreter initialization is intense.  Each FlightAware httpd process loads 468 packages.

I'd like to be able to cause only one vhost's Tcl interpreters to be reloaded by a Tcl_DeleteInterp / Tcl_CreateInterp / Rivet initialization process.  Instead of a graceful, you'd be able to specify something like a trigger file for each vhost.  Every time a vhost (with separate virtual interpreters) serves a page, it gets the mtime of the trigger file.  If the mtime of the trigger file has changed since the last time the interpreter served a page, Rivet deletes the virtual host's interpreter, creates and initializes a new one, and then handles the page.  [I tried to write this but kind of lost control of it and was not successful.]

This way, developers could totally reload all their libraries without any httpd processes being stopped or started.  Also this will lower overall overhead because a lot of times a httpd process won't have ever handled a page in its lifetime for many to most of the virtual interpreters.

An additional improvement would be to create the ability to not even initialize a vhost's separate virtual interpreter until the first time it is needed.

3. Apache children inheriting a preloaded interpreter from the parent Apache process

This is unrelated to separate virtual interpreters -- this is for the production website.  When we graceful a production webserver we take it out of the load balancer first.  When we start 200 httpd processes and each one loads 468 packages, it's not pretty.  There is a lot of lock contention in FreeBSD while these processes all bang on the same package directories, something we don't totally understand.  (This is much faster with ZFS, once tuned, btw.)  It takes minutes to settle down before our stuff can put the server back into the webserver.  What would be incredibly cool would be to be able to load up the 468 packages in the parent httpd process one time and then have each child process use the same Tcl interpreter already loaded with the packages.  This would be a way more than 200-fold improvement in Apache startup time for us (because it would eliminate all the contention.)  [I don't even know if this is possible.]

I offer this because it's like a possible direction to take Rivet development.  I can help, definitely, and provide production load :-), if anyone is interested in taking any of these ideas to further design, code, etc, please let me know.

Karl

Re: Rivet wish list

Posted by Michael Schlenker <sc...@uni-oldenburg.de>.

Am 19.01.2011 um 19:18 schrieb Karl Lehenbauer:

> 	3. Apache children inheriting a preloaded interpreter from the parent Apache process
> 
> This is unrelated to separate virtual interpreters -- this is for the production website.  When we graceful a production webserver we take it out of the load balancer first.  When we start 200 httpd processes and each one loads 468 packages, it's not pretty.  There is a lot of lock contention in FreeBSD while these processes all bang on the same package directories, something we don't totally understand.  (This is much faster with ZFS, once tuned, btw.)  It takes minutes to settle down before our stuff can put the server back into the webserver.  What would be incredibly cool would be to be able to load up the 468 packages in the parent httpd process one time and then have each child process use the same Tcl interpreter already loaded with the packages.  This would be a way more than 200-fold improvement in Apache startup time for us (because it would eliminate all the contention.)  [I don't even know if this is possible.]

Doesn't AOLserver do something like that for its cloned interpreter threads (like: https://bitbucket.org/aolserver/aolserver/src/2aa0f24395ae/nsd/tclinit.c) ? 
So for threaded Apache workers it should be doable.  
And at least for Java there were some experiments to apply it to cloning initialized jvm processes:
http://portal.acm.org/citation.cfm?id=1254812

Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Karl Lehenbauer <ka...@gmail.com>.

On Jan 20, 2011, at 4:41 PM, Damon Courtney wrote:

>>> so slave interpreters don't inherit packages loaded in their parent
>> interp? Getting the answer should be easy...
> 
> 
> I don't even think this is possible.  You're talking about creating an interp in the parent httpd process and then somehow handing that off to each child as a copy as it's created?  I think that's a great idea, but I don't see how to implement it.  AOLServer has this idea of cloning interpreters, but they're always working within the same process not multiple children.  And I don't even know if THEY clone the entire interpreter, packages and all. I would think they do though if it's a true interp clone.
> 
> I would love to be proven wrong though. 0-]  I don't have near the kind of load you guys are using, but the idea of cloning a full interpreter has been an idea I've wanted for a long time.  Cloning within the same process is possible.  Cloning in a child? *shrug*

Early in the development of the interp command I asked for "interp clone", which would drive through one interpreter and copy all the procs, arrays, namespaces, etc, from one interpreter into another.  It would still be badass, but the capability doesn't exist and isn't needed for what I'm describing.

Definitely if you fork a process that has a Tcl interpreter, the child will have a copy of the Tcl interpreter in the same state as the parent.  The only problem with that is that certain initializations can't be done before the fork.  For instance if I have an open connection to a postgresql server and I fork, the parent and the child will both have the same connection and that won't work.  So the developer needs to make sure that they get the packages loaded that all children need but that they don't initialize any stuff that can't or shouldn't happen until after the fork.

So the idea is the parent Apache process, when it does Rivet module initialization, we go do most or all of the stuff in Rivet_InitTclStuff on the master.  Like in FlightAware's case load 400+ packages.  Then the Apache server forks and, if separate virtual interpreters it has to do all the creation and initialization of each interpreter in the child as before.  But if it's not configured for separate virtual interpreters, the parent performs all of the GlobalInitScript stuff, which the child inherits, then the child invokes the Rivet ChildInitScripts scripts.  This is where you establish your database connections or whatever other stuff can't be done until after the fork.

OK, after all that, I don't know if it can be done either.

Oh also if separate virtual interpreters are defined, I think it still creates a root interpreter even though it doesn't use it.

---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Massimo Manghi <ma...@unipr.it>.

As far as a fork call is concerned seems to be reasonable. I can try to merge
this goal with my experiments on creating a server interp that could be used
to init ipc mechanisms. My first attempt created a standalone interp without
the usual initialization interpreters go through, but perhaps I can merge the
two goals.

I've just created a 'master-interp' development branch to experiment with it.
I will commit there any improvement on this goal.

 -- Massimo

On Thu, 20 Jan 2011 18:28:43 -0600, Karl Lehenbauer wrote
> 
> I've looked at this a little further.  I think we would create the 
> parent interpreter in the httpd parent process in Rivet_InitHandler()
> , which is invoked during http initialization because it was passed 
> to ap_hook_post_config() by rivet_register_hooks().
> 
> At this point, the interpreter in the parent httpd process, we'd 
> initialize it with the normal Tcl_Init-type stuff and then execute 
> any global init scripts.  Again, this all happens within 
> Rivet_InitHandler when there is only the one httpd process running.
> 
> As the parent process forks off all the httpd children, each child 
> has Rivet_ChildInit() called, as before, because Rivet_ChildInit was 
> passed to ap_hook_child_init() by rivet_register_hooks().
> 
> Rivet_ChildInit() would be modified to use the interpreter created 
> by Rivet_InitHandler(), unless separate virtual interpreters was 
> defined, and then instead of doing all that interpreter creation 
> stuff in Rivet_InitTclStuff(), like it does now, it would do the 
> Apache stuff it needs to do and also execute the child init scripts 
> on the interpreter it inherited from the parent when the parent 
> forked it off.
> 
> Per-page processing would continue unmodified.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
> For additional commands, e-mail: rivet-dev-help@tcl.apache.org


--


---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Massimo Manghi <ma...@unipr.it>.

A preliminary implementation of this scheme was successful with minimal
efforts. I haven't tested yet what happens when SVI are turned on and I'm
still using the rivet_server_init_script variable I had created. 

I put in the conf the line

RivetServerConf ServerInitScript "package require Tclx"

which is run in Rivet_InitHandler, and then I requested the execution of a
.rvt file, where the existence in interp of package Tclx was tested using
'package present Tclx": the package was in the interpreter (naturally the
server hadn't been run with the '-X' switch....;-))

On Thu, 20 Jan 2011 18:28:43 -0600, Karl Lehenbauer wrote
> I've looked at this a little further.  I think we would create the 
> parent interpreter in the httpd parent process in Rivet_InitHandler()
> , which is invoked during http initialization because it was passed 
> to ap_hook_post_config() by rivet_register_hooks().
> 
> At this point, the interpreter in the parent httpd process, we'd 
> initialize it with the normal Tcl_Init-type stuff and then execute 
> any global init scripts.  Again, this all happens within 
> Rivet_InitHandler when there is only the one httpd process running.
> 
> As the parent process forks off all the httpd children, each child 
> has Rivet_ChildInit() called, as before, because Rivet_ChildInit was 
> passed to ap_hook_child_init() by rivet_register_hooks().
> 
> Rivet_ChildInit() would be modified to use the interpreter created 
> by Rivet_InitHandler(), unless separate virtual interpreters was 

the interpreter pointer is kept in the rivet_server_conf structure which is
copied by 'fork' into the address space of the newly created child process, so
actually Rivet_ChildInit did not need to be modified.

> defined, and then instead of doing all that interpreter creation 
> stuff in Rivet_InitTclStuff(), like it does now, it would do the 
> Apache stuff it needs to do and also execute the child init scripts 
> on the interpreter it inherited from the parent when the parent 
> forked it off.
> 


  -- Massimo


--


---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Karl Lehenbauer <ka...@gmail.com>.

On Jan 20, 2011, at 5:42 PM, Massimo Manghi wrote:
> A fork call copies data, stack and heap from the parent to the child process, 
> so in principle an interpreter state could be copied into the new process 
> (text pages are shared). What is missing in this picture for having a real 
> cloning?

I've looked at this a little further.  I think we would create the parent interpreter in the httpd parent process in Rivet_InitHandler(), which is invoked during http initialization because it was passed to ap_hook_post_config() by rivet_register_hooks().

At this point, the interpreter in the parent httpd process, we'd initialize it with the normal Tcl_Init-type stuff and then execute any global init scripts.  Again, this all happens within Rivet_InitHandler when there is only the one httpd process running.

As the parent process forks off all the httpd children, each child has Rivet_ChildInit() called, as before, because Rivet_ChildInit was passed to ap_hook_child_init() by rivet_register_hooks().

Rivet_ChildInit() would be modified to use the interpreter created by Rivet_InitHandler(), unless separate virtual interpreters was defined, and then instead of doing all that interpreter creation stuff in Rivet_InitTclStuff(), like it does now, it would do the Apache stuff it needs to do and also execute the child init scripts on the interpreter it inherited from the parent when the parent forked it off.

Per-page processing would continue unmodified.

---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Massimo Manghi <ma...@unipr.it>.

On Thu, 20 Jan 2011 16:41:34 -0600, Damon Courtney wrote

> > 
> > so slave interpreters don't inherit packages loaded in their parent
> > interp? Getting the answer should be easy...
> 
> I don't even think this is possible.  You're talking about creating 
> an interp in the parent httpd process and then somehow handing that 
> off to each child as a copy as it's created?  I think that's a great 
> idea, but I don't see how to implement it.  AOLServer has this idea 
> of cloning interpreters, but they're always working within the same 
> process not multiple children.  And I don't even know if THEY clone 
> the entire interpreter, packages and all. I would think they do 
> though if it's a true interp clone.
> 
> I would love to be proven wrong though. 0-]  I don't have near the 
> kind of load you guys are using, but the idea of cloning a full 
> interpreter has been an idea I've wanted for a long time.  Cloning 
> within the same process is possible.  Cloning in a child? *shrug*
> 
> D

sorry, when I answered it was late in the night and I overlapped the problem 2 
and problem 3. 

A fork call copies data, stack and heap from the parent to the child process, 
so in principle an interpreter state could be copied into the new process 
(text pages are shared). What is missing in this picture for having a real 
cloning?

-- Massimo


---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org

Re: Rivet wish list

Posted by Damon Courtney <da...@tclhome.com>.

>>  It takes minutes to settle down before our stuff can put the 
>> server back into the webserver.  What would be incredibly cool would 
>> be to be able to load up the 468 packages in the parent httpd process 
>> one time and then have each child process use the same Tcl interpreter 
>> already loaded with the packages.  This would be a way more than 
>> 200-fold improvement in Apache startup time for us (because it 
>> would eliminate all the contention.)  [I don't even know if this 
>> is possible.] 
> 
> so slave interpreters don't inherit packages loaded in their parent
> interp? Getting the answer should be easy...


I don't even think this is possible.  You're talking about creating an interp in the parent httpd process and then somehow handing that off to each child as a copy as it's created?  I think that's a great idea, but I don't see how to implement it.  AOLServer has this idea of cloning interpreters, but they're always working within the same process not multiple children.  And I don't even know if THEY clone the entire interpreter, packages and all. I would think they do though if it's a true interp clone.

I would love to be proven wrong though. 0-]  I don't have near the kind of load you guys are using, but the idea of cloning a full interpreter has been an idea I've wanted for a long time.  Cloning within the same process is possible.  Cloning in a child? *shrug*

D

Re: Rivet wish list

Posted by Massimo Manghi <ma...@unipr.it>.

I'm re-sending this on the list as my first answer went only to Karl.

On Wed, 19 Jan 2011 18:18:55 +0000, Karl Lehenbauer wrote 
> Greetings, Programs! 
> 
> Here are a couple-three things I think would make a big difference for 
> using Rivet to build and deploy big websites.

>  * Named separate virtual interpreters.

> We use separate virtual interpreters for development (for each developer 
> we diddle auto_path to give them source-controlled private copies of all 
> of our packages)  and it's really badass.  The problem comes in the 
> all-or-nothing approach of SVI.  We set up a virtual host for each
> developer, for both port 80 (well really 8080 + varnish on 80) and 443. 
> This results in 18 interpreters in each httpd process on our development 
> machine, making for very large httpd processes and slow startup time 
> after a graceful. 
> 
> If we could name the virtual interpreters, we could cut the number of 
> interpreters from 18 to 9 in this case. 
>

If I understand you would keep a vhost for each developer but both the 
80 and 443 ports would have the same interpreter by changing the naming
scheme in Rivet_InitTclStuff

    for (sr = s; sr; sr = sr->next)
    {
        ....
        if (sr != s) /* not the first one  */
        {
            if (rsc->separate_virtual_interps != 0) {
                char *slavename = (char*) apr_psprintf (p, "%s_%d_%d", 
                        sr->server_hostname, 
                        sr->port,
                        interpCount++);

                /* Separate virtual interps. */
                myrsc->server_interp = Tcl_CreateSlave(interp, slavename, 0);
                if (myrsc->server_interp == NULL) {
                    ap_log_error( APLOG_MARK, APLOG_ERR, APR_EGENERAL, s,
                                    "slave interp create failed: %s",
                                    Tcl_GetStringResult(interp) );
                    exit(1);
                }
                Rivet_PerInterpInit(s, myrsc, p);
            } else {
                myrsc->server_interp = rsc->server_interp;
            }
	    ...... 
	   
        }
        ...
    }


it doesn't seem impossible, probably just creating a temporary hash table
mapping server_hostname to a slave interpreter and reassigning the same
interpreter down in the chain of virtual host records. 
Did I get it right?
  
>  2. Virtual interpreter restarts without a graceful 
> 
> Right now if one of our developers changes a package, private to them, 
> they still have to do an apachectl graceful to pick up the change.  
> This restarts all of the httpd processes and reinitializes all of the 
> interpreters.  Our interpreter initialization is intense.  
> Each FlightAware httpd process loads 468 packages. 
> 
> I'd like to be able to cause only one vhost's Tcl interpreters to be 
> reloaded by a Tcl_DeleteInterp / Tcl_CreateInterp / Rivet initialization
> process.  Instead of a graceful, you'd be able to specify something 
> like a trigger file for each vhost.  Every time a vhost (with separate 
> virtual interpreters) serves a page, it gets the mtime of the trigger 
> file.  If the mtime of the trigger file has changed since the last time 
> the interpreter served a page, Rivet deletes the virtual host's 
> interpreter, creates and initializes a new one, and then handles the 
> page.  [I tried to write this but kind of lost control of it and
> was not successful.] 
> 
> This way, developers could totally reload all their libraries 
> without any httpd processes being stopped or started.  Also this 
> will lower overall overhead because a lot of times a httpd process 
> won't have ever handled a page
> in its lifetime for many to most of the virtual interpreters. 
> 
> An additional improvement would be to create the ability to not even
> initialize a vhost's separate virtual interpreter until the first time 
> it is needed. 

I think this is doable and in a way overlaps with what I'm doing 
with my ServerInitScript. I was trying to isolate the interpreter
initialization function in order to make it reusable in different
contexts. It would be nice if we could came up with an ipc scheme 
that could help also in the application context. Apache people
say that stuff like shared memory is something one has to keep
off when doing web programming, but I always liked system 
programming....

> 
>  3. Apache children inheriting a preloaded interpreter from the 
>  parent Apache process 
> 
> This is unrelated to separate virtual interpreters -- this is for the
> production website.  When we graceful a production webserver we take 
> it out of the load balancer first.  When we start 200 httpd processes 
> and each one loads
> 468 packages, it's not pretty.  There is a lot of lock contention in 
> FreeBSD while these processes all bang on the same package directories, 
> something we don't totally understand.  (This is much faster with ZFS, 
> once tuned, btw.)
> It takes minutes to settle down before our stuff can put the 
> server back into the webserver.  What would be incredibly cool would 
> be to be able to load up the 468 packages in the parent httpd process 
> one time and then have each child process use the same Tcl interpreter 
> already loaded with the packages.  This would be a way more than 
> 200-fold improvement in Apache startup time for us (because it 
> would eliminate all the contention.)  [I don't even know if this 
> is possible.] 
> 
> I offer this because it's like a possible direction to take Rivet
> development.  I can help, definitely, and provide production load :-), 
> if anyone is interested in taking any of these ideas to further 
> design, code, etc, please let me know. 
> 

so slave interpreters don't inherit packages loaded in their parent
interp? Getting the answer should be easy...

> Karl

-- Massimo


---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscribe@tcl.apache.org
For additional commands, e-mail: rivet-dev-help@tcl.apache.org