You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mod_python-dev@quetz.apache.org by Graham Dumpleton <gr...@dscpl.com.au> on 2005/08/10 13:00:31 UTC

Few issues with new mod_python.publisher.

Have started to have a look through the latest mod_python.publisher
code, got a few comments to make about it.

First is that it has been changed to allow "HEAD" requests whereas
before it only supported "GET" and "POST". A "HEAD" request is handled
by the mod_python.publisher code saying:

         if req.method!='HEAD':
             req.write(result)

That is, it is supposed to work by detecting the "HEAD" request and
not actually writing back the result in that case.

This solution though ignores the fact that even with 
mod_python.publisher
one can explicitly use the "req" object to write back data. Ie.,

   def method(req):
     req.content_type = "text/plain"
     req.write("data-1\n")
     return "data-2\n"

In this case, for a "HEAD" request, one will still get "data-1" sent 
back
but "data-2" will not.

I guess the question I have is whether it is worthwhile supporting 
"HEAD"
in the first place with mod_python.publisher? What does one actually 
gain,
especially when it will not strictly work in all situations?

Second issue is the fact that new mod_python.publisher code uses the
req.finfo attribute. This may be convenient, but it prevents some cute
stuff being done in the future.

It is all about a vision I have of being able to implement middleware
like handler components for mod_python much as WSGI is attempting. In
this case we can use the full power of Apache and mod_python whereas
WSGI is restricted to its minimal server API.

The first thing that really needs to be done to achieve a similar thing
in mod_python is to make req.path_info writable. When this is added to
the fact that req.filename is already writable, it would allow a 
middleware
component to setup both req.filename and req.path_info and then trigger
existing mod_python handlers such as mod_python.publisher.handler() to
serve up the request based on these values.

A big benefit of this is that you can put all your web application 
Python
code outside of your document tree with just a little middleware stack
within the document tree defining the delegation to the external code.
If someone screws up your Apache configuration and exposes your files,
all that is exposed is the minimal bit of code which performs the
delegation.

Another benefit of modelling stuff as middleware, including wrappers
around existing handlers such as publisher and PSP, is that it becomes
easier to mix use of different mechanisms in a more flexible way without
having to resort to gymnastics in the Apache configuration.

For this sort of thing to work, the middleware component will be what
maps the URL against the external directory and sets up req.filename and
req.path_info as appropriate. Ie., this would no longer be done by
Apache and thus the value of req.finfo is never set to a usable value.
That mod_python.publisher relies on req.finfo means that this sort of
thing may not be practical to do, which would be disappointing. :-(

I realise that people may not understand where I am going with this and
it may be a personal crusade of mine, but have been real busy of late
and haven't had a chance to get my code suitable for general 
consumption.

A final issue, is that latest mod_python.publisher no longer pays
attention to result of req.get_addhandler_exts(). This means that where
you used to be able to say:

   # .htaccess

   AddHandler mod_python .html
   PythonHandler mod_python.publisher

   # page.py

   def index():
     return "<html></body><p>XXX</p></body></html>"

with URL of "/page.html", this no longer appears to work for me. In
short it appears that one can only use ".py" as an extension or no
extension at all. Defining another extension to AddHandler or
PythonHandler seems to have no effect.

I'll post about anything else I find another time.

Graham

Making req.finfo writable.

Posted by Graham Dumpleton <gr...@dscpl.com.au>.

On 10/08/2005, at 10:36 PM, Nicolas Lehuen wrote:

>
> Second issue is the fact that new mod_python.publisher code uses the
> req.finfo attribute. This may be convenient, but it prevents some cute
> stuff being done in the future.
>
> Actually, I used req.finfo as it allowed me to save a call to  
> os.stat() to check whether
> the requested file existed or not. I can revert back to using  
> os.stat(req.filename) if you prefer.

I note that you aren't using req.finfo any more. Appreciate that. I also
note your comment about whether mod_python.publisher should just
modify req.path_info in place. I'll have a think about that one. :-)

Have had a bit of a think about req.finfo in general and have come to  
the
conclusion that since req.filename is writable, that req.finfo must  
also be
writable.

The reason for this is that if req.filename is changed but req.finfo  
isn't at
least set to None, then they will be referring to different files.  
Ie., the
following will not be the same if someone changes req.filename.

   req.filename
   req.finfo[apache.FINFO_FNAME]

and all the other information that req.finfo refers to would be for a
different file/directory to that referred to by req.filename. This  
could be
an issue where some code relied on these both referring to the same  
file.

In the context of what I am working on where the intent is that  
req.filename
and req.path_info will be modified, I thus should be setting  
req.finfo to None
whenever the value of req.filename is changed, however mod_python does
not currently allow that to be done.

I'll post a JIRA enhancement item about making req.finfo writable with
a patch when I get a chance in the next few days.

Graham

Re: Few issues with new mod_python.publisher.

Posted by Nicolas Lehuen <ni...@gmail.com>.

2005/8/10, Graham Dumpleton <gr...@dscpl.com.au>:
> 
> Have started to have a look through the latest mod_python.publisher
> code, got a few comments to make about it.
> 
> First is that it has been changed to allow "HEAD" requests whereas
> before it only supported "GET" and "POST". A "HEAD" request is handled
> by the mod_python.publisher code saying:
> 
> if req.method!='HEAD':
> req.write(result)
> 
> That is, it is supposed to work by detecting the "HEAD" request and
> not actually writing back the result in that case.
> 
> This solution though ignores the fact that even with
> mod_python.publisher
> one can explicitly use the "req" object to write back data. Ie.,
> 
> def method(req):
> req.content_type = "text/plain"
> req.write("data-1\n")
> return "data-2\n"
> 
> In this case, for a "HEAD" request, one will still get "data-1" sent
> back
> but "data-2" will not.
> 
> I guess the question I have is whether it is worthwhile supporting
> "HEAD"
> in the first place with mod_python.publisher? What does one actually
> gain,
> especially when it will not strictly work in all situations?


The new code behaves as the old one. I don't have much thoughts about this, 
except that maybe the HEAD / GET method management should be handled at the 
application level, not at the mod_python level. However, it's quite a big 
thing to expect the application writers to support all the HTTP intricacies 
(HEAD, E-Tags & Last-Modified, using the correct reponse code for redirects, 
etc. etc.) 

Second issue is the fact that new mod_python.publisher code uses the
> req.finfo attribute. This may be convenient, but it prevents some cute
> stuff being done in the future.

Actually, I used req.finfo as it allowed me to save a call to os.stat() to 
check whether
the requested file existed or not. I can revert back to using os.stat(
req.filename) if you prefer.

It is all about a vision I have of being able to implement middleware
> like handler components for mod_python much as WSGI is attempting. In
> this case we can use the full power of Apache and mod_python whereas
> WSGI is restricted to its minimal server API.


The first thing that really needs to be done to achieve a similar thing
> in mod_python is to make req.path_info writable. When this is added to
> the fact that req.filename is already writable, it would allow a
> middleware
> component to setup both req.filename and req.path_info and then trigger
> existing mod_python handlers such as mod_python.publisher.handler() to
> serve up the request based on these values.
> 
> A big benefit of this is that you can put all your web application
> Python
> code outside of your document tree with just a little middleware stack
> within the document tree defining the delegation to the external code.
> If someone screws up your Apache configuration and exposes your files,
> all that is exposed is the minimal bit of code which performs the
> delegation.
> 
> Another benefit of modelling stuff as middleware, including wrappers
> around existing handlers such as publisher and PSP, is that it becomes
> easier to mix use of different mechanisms in a more flexible way without
> having to resort to gymnastics in the Apache configuration.
> 
> For this sort of thing to work, the middleware component will be what
> maps the URL against the external directory and sets up req.filename and
> req.path_info as appropriate. Ie., this would no longer be done by
> Apache and thus the value of req.finfo is never set to a usable value.
> That mod_python.publisher relies on req.finfo means that this sort of
> thing may not be practical to do, which would be disappointing. :-(
> 
> I realise that people may not understand where I am going with this and
> it may be a personal crusade of mine, but have been real busy of late
> and haven't had a chance to get my code suitable for general
> consumption.
> 
> A final issue, is that latest mod_python.publisher no longer pays
> attention to result of req.get_addhandler_exts(). This means that where
> you used to be able to say:
> 
> # .htaccess
> 
> AddHandler mod_python .html
> PythonHandler mod_python.publisher
> 
> # page.py
> 
> def index():
> return "<html></body><p>XXX</p></body></html>"
> 
> with URL of "/page.html", this no longer appears to work for me. In
> short it appears that one can only use ".py" as an extension or no
> extension at all. Defining another extension to AddHandler or
> PythonHandler seems to have no effect.


OK this is a real problem that I'll have to fix soon. 

I'll post about anything else I find another time.
> 
> Graham
> 
>

Re: mod_python package maintainers - are you out there?

Posted by Jorey Bump <li...@joreybump.com>.

Juha-Matti Tapio wrote:
> On Wed, Aug 10, 2005 at 09:04:04AM -0400, Jim Gallacher wrote:
> 
>>Nicolas Lehuen wrote:
>>
>>Having some contact with them directly is probably a good idea anyway.
>>Subscribing to a bunch of mailing lists could result in a lot of
>>uninteresting mail. ;)
> 
> I think it would be unfair to expect that upstream developers track all the
> major distributions.

But it's easy enough to download their source packages and inspect them 
for interesting patches every once in a while. I think package 
maintainers will respond favorably if their work was acknowledged. Maybe 
they tried to submit a bug in the past, but couldn't, for some reason.

Re: mod_python package maintainers - are you out there?

Posted by Juha-Matti Tapio <jm...@verkkotelakka.net>.

On Wed, Aug 10, 2005 at 09:04:04AM -0400, Jim Gallacher wrote:
> Nicolas Lehuen wrote:
> >I don't understand why those guys 
> >aren't posting their patches on the mod_python mailing list.
> I was wondering the same thing. What would be better for us, subscribing 
> to a bunch of mailing lists or contacting the maintainers directly and 
> telling them we are interested in having them forward their bug reports 
> and patches (and then hoping they do)?

With Debian it is generally understood that the Debian users should report
bugs to the Debian Bug Tracking System and the package maintainer forwards
them to the upstream if the issue is not Debian-specific. Since the
mentioned bug is 162 days old in Debian, I should have monitored the bug and
as an original submitter I should have taken the issue upstream myself
bypassing the package maintainer.

Propably the most common reasons why this communication does not happen is
that people forget or do not have time (and plan to do it later).

> Having some contact with them directly is probably a good idea anyway.
> Subscribing to a bunch of mailing lists could result in a lot of
> uninteresting mail. ;)

I think it would be unfair to expect that upstream developers track all the
major distributions.

> But first let's have a show of hands. How many people monitoring either 
> of the mod_python or python-dev mailing lists are package maintainers 
> for a distribution?

I do monitor and I do know Debian packaging but I am not an official
developer nor do I maintain official Debian packages.

mod_python package maintainers - are you out there?

Posted by Jim Gallacher <jg...@sympatico.ca>.

Nicolas Lehuen wrote:
> Another remark : has anyone suscribed to redhat, debian etc. mailing 
> list to watch for such patches ? 

Not me.

> I don't understand why those guys 
> aren't posting their patches on the mod_python mailing list.

I was wondering the same thing. What would be better for us, subscribing 
to a bunch of mailing lists or contacting the maintainers directly and 
telling them we are interested in having them forward their bug reports 
and patches (and then hoping they do)? Having some contact with them 
directly is probably a good idea anyway. Subscribing to a bunch of 
mailing lists could result in a lot of uninteresting mail. ;)

But first let's have a show of hands. How many people monitoring either 
of the mod_python or python-dev mailing lists are package maintainers 
for a distribution?

Regards,
Jim

Re: Few issues with new mod_python.publisher.

Posted by Nicolas Lehuen <ni...@gmail.com>.

Another remark : has anyone suscribed to redhat, debian etc. mailing list to 
watch for such patches ? I don't understand why those guys aren't posting 
their patches on the mod_python mailing list.

Regards,
Nicolas

2005/8/10, Nicolas Lehuen <ni...@gmail.com>:
> 
> This is particularly tricky. I don't know how many app server prevent 
> content from being returned on a HEAD request, but I guess there aren't too 
> many. However, it's a MUST NOT, so it should be prevented.
> 
> Maybe mod_python could replace req.write with a NOP function whenever the 
> request method is HEAD ?
> 
> Regards,
> Nicolas
> 
> 2005/8/10, Juha-Matti Tapio <jm...@verkkotelakka.net>:
> > 
> > On Wed, Aug 10, 2005 at 10:08:08PM +1000, Graham Dumpleton wrote:
> > > Okay. What though are the implications of content still being returned
> > > for a HEAD request. What does the standard say about that?
> > 
> > RFC 2616, 9.4 HEAD
> > 
> > The HEAD method is identical to GET except that the server MUST NOT 
> > return
> > a message-body in the response. The metainformation contained in the 
> > HTTP
> > headers in response to a HEAD request SHOULD be identical to the 
> > information sent in response to a GET request.
> > 
> > Ok, now that I read the standard again, it seems that the preferable way 
> > to
> > implement this is to somehow prevent the bytes written from being sent.
> > 
> > Though I have no idea how the user agents would react if they did 
> > receive 
> > a message body on HEAD.
> > 
> > 
> > --
> > Tmi Juha-Matti Tapio Puh/Tel. +358-50-5419230
> > Y-tunnus 1911527-0 Fax +358-9-34756631
> > 
> > 
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.1 (GNU/Linux) 
> > 
> > iD8DBQFC+fDIRGhQc/k/gTsRApwKAKCTo/wpAWbNHW03gmreQNrfedbbqwCfe4nx
> > /L748xZkfuWksurvYESuFBM=
> > =FIch
> > -----END PGP SIGNATURE-----
> > 
> > 
> > 
>

Re: Few issues with new mod_python.publisher.

Posted by Nicolas Lehuen <ni...@gmail.com>.

This is particularly tricky. I don't know how many app server prevent 
content from being returned on a HEAD request, but I guess there aren't too 
many. However, it's a MUST NOT, so it should be prevented.

Maybe mod_python could replace req.write with a NOP function whenever the 
request method is HEAD ?

Regards,
Nicolas

2005/8/10, Juha-Matti Tapio <jm...@verkkotelakka.net>:
> 
> On Wed, Aug 10, 2005 at 10:08:08PM +1000, Graham Dumpleton wrote:
> > Okay. What though are the implications of content still being returned
> > for a HEAD request. What does the standard say about that?
> 
> RFC 2616, 9.4 HEAD
> 
> The HEAD method is identical to GET except that the server MUST NOT return
> a message-body in the response. The metainformation contained in the HTTP
> headers in response to a HEAD request SHOULD be identical to the
> information sent in response to a GET request.
> 
> Ok, now that I read the standard again, it seems that the preferable way 
> to
> implement this is to somehow prevent the bytes written from being sent.
> 
> Though I have no idea how the user agents would react if they did receive
> a message body on HEAD.
> 
> 
> --
> Tmi Juha-Matti Tapio Puh/Tel. +358-50-5419230
> Y-tunnus 1911527-0 Fax +358-9-34756631
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> 
> iD8DBQFC+fDIRGhQc/k/gTsRApwKAKCTo/wpAWbNHW03gmreQNrfedbbqwCfe4nx
> /L748xZkfuWksurvYESuFBM=
> =FIch
> -----END PGP SIGNATURE-----
> 
> 
>

Re: Few issues with new mod_python.publisher.

Posted by Jorey Bump <li...@joreybump.com>.

Jim Gallacher wrote:

> Interestingly, section 5.1.1 says that "The methods GET and HEAD MUST be 
> supported by all general-purpose servers.", so it would seem that 
> mod_python has not been compliant to the RFC.

FWIW, the Debian Woody package of mod_python (libapache-mod-python 
2.7.8-0.0woody5) running under apache 1.3.26 displays HEAD properly, 
while a compiled mod_python 3.1.4/apache 2.1.3-beta system does not.

Perhaps mod_python underwent a change or Debian already includes a patch 
for HEAD.

Re: Few issues with new mod_python.publisher.

Posted by Jim Gallacher <jg...@sympatico.ca>.

Juha-Matti Tapio wrote:
> On Wed, Aug 10, 2005 at 10:08:08PM +1000, Graham Dumpleton wrote:
> 
>>Okay. What though are the implications of content still being returned
>>for a HEAD request. What does the standard say about that?
> 
> 
> RFC 2616, 9.4 HEAD
> 
>   The HEAD method is identical to GET except that the server MUST NOT return
>   a message-body in the response. The metainformation contained in the HTTP
>   headers in response to a HEAD request SHOULD be identical to the
>   information sent in response to a GET request.
> 
> Ok, now that I read the standard again, it seems that the preferable way to
> implement this is to somehow prevent the bytes written from being sent.
> 
> Though I have no idea how the user agents would react if they did receive
> a message body on HEAD.

It defeats the purpose of HEAD for starters, ie wastes bandwidth. If the 
user agent is making use of HEAD and gets stuffed with a large download 
each time - well that can't be good for performance. ;)

To save everyone looking it up here is the full section 9.4

http://www.rfc.net/rfc2616.html#s9.4
9.4 HEAD


    The HEAD method is identical to GET except that the server MUST NOT
    return a message-body in the response. The metainformation contained
    in the HTTP headers in response to a HEAD request SHOULD be identical
    to the information sent in response to a GET request. This method can
    be used for obtaining metainformation about the entity implied by the
    request without transferring the entity-body itself. This method is
    often used for testing hypertext links for validity, accessibility,
    and recent modification.

    The response to a HEAD request MAY be cacheable in the sense that the
    information contained in the response MAY be used to update a
    previously cached entity from that resource. If the new field values
    indicate that the cached entity differs from the current entity (as
    would be indicated by a change in Content-Length, Content-MD5, ETag
    or Last-Modified), then the cache MUST treat the cache entry as
    stale.



Interestingly, section 5.1.1 says that "The methods GET and HEAD MUST be 
supported by all general-purpose servers.", so it would seem that 
mod_python has not been compliant to the RFC.

http://www.rfc.net/rfc2616.html#s5.1.1
5.1.1 Method


    The Method  token indicates the method to be performed on the
    resource identified by the Request-URI. The method is case-sensitive.

        Method         = "OPTIONS"                ; Section 9.2
                       | "GET"                    ; Section 9.3
                       | "HEAD"                   ; Section 9.4
                       | "POST"                   ; Section 9.5
                       | "PUT"                    ; Section 9.6
                       | "DELETE"                 ; Section 9.7
                       | "TRACE"                  ; Section 9.8
                       | "CONNECT"                ; Section 9.9
                       | extension-method
        extension-method = token

    The list of methods allowed by a resource can be specified in an
    Allow header field (section 14.7). The return code of the response
    always notifies the client whether a method is currently allowed on a
    resource, since the set of allowed methods can change dynamically. An
    origin server SHOULD return the status code 405 (Method Not Allowed)
    if the method is known by the origin server but not allowed for the
    requested resource, and 501 (Not Implemented) if the method is
    unrecognized or not implemented by the origin server. The methods GET
    and HEAD MUST be supported by all general-purpose servers. All other
    methods are OPTIONAL; however, if the above methods are implemented,
    they MUST be implemented with the same semantics as those specified
    in section 9.

Regards,
Jim

Re: Few issues with new mod_python.publisher.

Posted by Juha-Matti Tapio <jm...@verkkotelakka.net>.

On Wed, Aug 10, 2005 at 10:08:08PM +1000, Graham Dumpleton wrote:
> Okay. What though are the implications of content still being returned
> for a HEAD request. What does the standard say about that?

RFC 2616, 9.4 HEAD

  The HEAD method is identical to GET except that the server MUST NOT return
  a message-body in the response. The metainformation contained in the HTTP
  headers in response to a HEAD request SHOULD be identical to the
  information sent in response to a GET request.

Ok, now that I read the standard again, it seems that the preferable way to
implement this is to somehow prevent the bytes written from being sent.

Though I have no idea how the user agents would react if they did receive
a message body on HEAD.

-- 
Tmi Juha-Matti Tapio    Puh/Tel. +358-50-5419230
Y-tunnus 1911527-0      Fax      +358-9-34756631

Re: Few issues with new mod_python.publisher.

Posted by Graham Dumpleton <gr...@dscpl.com.au>.

Okay. What though are the implications of content still being returned
for a HEAD request. What does the standard say about that?

On 10/08/2005, at 10:01 PM, Juha-Matti Tapio wrote:

> On Wed, Aug 10, 2005 at 09:00:31PM +1000, Graham Dumpleton wrote:
>> I guess the question I have is whether it is worthwhile supporting
>> "HEAD" in the first place with mod_python.publisher? What does one
>> actually gain, especially when it will not strictly work in all
>> situations?
>
> The RFC specifies that HEAD should return the headers as does GET.
>
> I have summarized the details in:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=297289
>
> HEAD not working causes problems for example when some versions of 
> Firefox
> use HEAD to check if the file has changed when saving a document that 
> is
> currently open.
>
> I personally would prefer if this worked properly at least on the most
> common cases. It is a really difficult error to trace (by 
> adminstrators)
> when the browser suddenly claims that the currently open
> mod_python-generated document can not be saved because it does not 
> exist.

Re: Few issues with new mod_python.publisher.

Posted by Juha-Matti Tapio <jm...@verkkotelakka.net>.

On Wed, Aug 10, 2005 at 09:00:31PM +1000, Graham Dumpleton wrote:
> I guess the question I have is whether it is worthwhile supporting 
> "HEAD" in the first place with mod_python.publisher? What does one 
> actually gain, especially when it will not strictly work in all 
> situations?

The RFC specifies that HEAD should return the headers as does GET.

I have summarized the details in:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=297289

HEAD not working causes problems for example when some versions of Firefox
use HEAD to check if the file has changed when saving a document that is
currently open.

I personally would prefer if this worked properly at least on the most
common cases. It is a really difficult error to trace (by adminstrators)
when the browser suddenly claims that the currently open
mod_python-generated document can not be saved because it does not exist.