You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mod_python-dev@quetz.apache.org by Sterling Hughes <st...@bumblebury.com> on 2003/04/09 23:19:25 UTC

Talking about PSP: Internals

Hi,

With the integration of PSP and mod_python, and some of the future
directions for PSP to take, I wanted to bring up some of my ideas on
list, to hear what people think.  I'll start with some of the internal
stuff first, and then move onto the API stuff.

One of the first things I want to do with PSP is make sure it works
under Apache's threaded model.  Currently PSP uses a C lexer based on
flex.  Therefore, global variables are used while interfacing with flex
itself (extern FILE *yyin, for example), and they are also used when
parsing the document (rewriting the PSP document to a global variable).

The "easy" solution (and the one I know) is simply to rewrite the lexer
to use C++.  I can define a class that extends the yyFlexLexer class,
and has local storage.  This class is then used for all parsing, its a
relatively elegant solution.  The main problem with this however is that
mod_python is a C project, and I'm wondering if its ok to mix the two
(most systems *should* have a C++ compiler.)

I'm open to other solutions, I know re2c supports developing thread safe
scanners (although I'm not really familair with it), so I guess that's
an option (re2c scanners are a bit faster than flex scanners in most
cases).

The second, less desirable option is to use mutexes.  I would prefer
just a big old mutex around the parse loop, as PSP scripts will rarely
be reparsed, and mostly accessed from a hashtable cache.  Either that,
or we could use thread local storage (tls), which is slow, and store the
context there.

That leads me to the second area where threadsafety is an issue.  I
currently maintain a global cache of compiled psp files (PyObject*'s). 
This needs to either be mutexed, or, again, stored in tls.  Does apache
2 currently have an api for storing global data (perhaps a table
allocated from r->server->pool?)

My second internal issue with PSP is the hashtable of precompiled
files.  This idea was taken from the PHP world, where we have external
accellerators, like the Zend Cache, or ionCube encoder.  These caches
deep-copy objects into shared memory, and then access them from a
(size-configurable) pool.  

PSP is still young, and therefore for quickness, I decided to just store
the data in process memory.  Storing the data in process memory is
faster (implementation wise, and speed wise).  However, the current
implementation has a drawback - if you use large files, all that gets
tacked on to the local memory.  Therefore on larger scripts its possible
to get enormous httpd processes lying around.

The first step is adding a quick option to httpd.conf, that says "max
hash size, then purge."  This is something I'll definitely implement in
the short-run, but I'm interested in what people think regarding shared
memory storage.  Do you think its worth it to add a SHM cache, and, does
httpd provide an API for cleanly storing data in shared memory across
multiple systems?

Ok, well, that's what I'm thinking about at the moment regarding PSP
internals.  I'm interested to hear all your thoughts on this. :)

-Sterling
-- 
"A business that makes nothing but money is a poor kind of business." 
    - Henry Ford


Thread-safe lexer

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.
On 9 Apr 2003, Sterling Hughes wrote:

> One of the first things I want to do with PSP is make sure it works
> under Apache's threaded model.  Currently PSP uses a C lexer based on
> flex.  Therefore, global variables are used while interfacing with flex
> itself (extern FILE *yyin, for example), and they are also used when
> parsing the document (rewriting the PSP document to a global variable).
>
> The "easy" solution (and the one I know) is simply to rewrite the lexer
> to use C++.  I can define a class that extends the yyFlexLexer class,
> and has local storage.  This class is then used for all parsing, its a
> relatively elegant solution.  The main problem with this however is that
> mod_python is a C project, and I'm wondering if its ok to mix the two
> (most systems *should* have a C++ compiler.)

I was researching this option a little bit tonight, and another reason why
using c++ will be a problem is loading c++ shared lib into a C program
(you'd have to somehow make libstdc++ symbols available) - some of these
issues are described here:

http://wolfgang.groogroo.com/apache-cplusplus/

I also googled for an example of a thread-safe use of flex, but couldn't
find much other than c++ examples... Still I feel this wheel has got to
have been invented already...

Grisha


Re: Talking about PSP: Internals

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.
On 9 Apr 2003, Sterling Hughes wrote:

> One of the first things I want to do with PSP is make sure it works
> under Apache's threaded model.

> The "easy" solution (and the one I know) is simply to rewrite the lexer
> to use C++.

There must have been some reason why you put "easy" in quotes above... ;-)

> My second internal issue with PSP is the hashtable of precompiled
> files.  This idea was taken from the PHP world, where we have external
> accellerators, like the Zend Cache, or ionCube encoder.  These caches
> deep-copy objects into shared memory, and then access them from a
> (size-configurable) pool.
>
> PSP is still young, and therefore for quickness, I decided to just store
> the data in process memory.  Storing the data in process memory is
> faster (implementation wise, and speed wise).  However, the current
> implementation has a drawback - if you use large files, all that gets
> tacked on to the local memory.  Therefore on larger scripts its possible
> to get enormous httpd processes lying around.

I was thinking that Python's way of adressing this issue is the *.pyc
files. I'm not sure how big of a performance gain you get from reading
already parsed bytecode from disk, but it should be pretty good,
especially consiering that most modern filesystem cache things in memory.

Perhaps we could use a cache memory limit, but also use something similar
to Python's pyc - if filesystem is read-only (which is going to be the
case most of the time), then it parses every time (unless it's cached in
memory), but if it can write, then a compiled file is created on the fly.

But in addition to this we could offer a tool for users to compile their
pages manually from command line to work around the read-only filesystem
limitations.

> The first step is adding a quick option to httpd.conf, that says "max
> hash size, then purge."  This is something I'll definitely implement in
> the short-run, but I'm interested in what people think regarding shared
> memory storage.  Do you think its worth it to add a SHM cache, and, does
> httpd provide an API for cleanly storing data in shared memory across
> multiple systems?

I think APR supports shared memory. On apache 1.3 this is a more difficult
issue.

Grisha