You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mod_python-dev@quetz.apache.org by Nicolas Lehuen <ni...@gmail.com> on 2005/04/30 10:11:16 UTC

mod_python.publisher : proposal for a few implementation changes

Hi,

I'm trying to solve both MODPYTHON-15
(http://issues.apache.org/jira/browse/MODPYTHON-15) and MODPYTHON-16
(http://issues.apache.org/jira/browse/MODPYTHON-16) in one strike...

What do you think of this approach : the object returned by
resolve_object is then passed to this publish_object function :

# This regular expression is used to test for the presence of an HTML header
# tag, written in upper or lower case.
re_html = re.compile(r"<HTML",re.I)

def publish_object(req, object):
    if callable(object):
        req.form = util.FieldStorage(req, keep_blank_values=1)
        return publish_object(req,util.apply_fs_data(object, req.form, req=req))
    elif hasattr(object,'__iter__'):
        result = False
        for result in object:
            result |= publish_object(req,object)
        return result
    else:
        if object is None:
            return False
        elif isinstance(object,UnicodeType):
            # TODO : this skips all encoding issues, which is VERY BAD
            # I don't even understand how the req.write below can work !
            result = object
        else:
            result = str(object)
            
        if not req._content_type_set:
            # make an attempt to guess content-type
            if re_html.search(result,0,100):
                req.content_type = 'text/html'
            else:
                req.content_type = 'text/plain'
        
        if req.method!='HEAD':
            req.write(result)

        return True


This way, we could support classes, class instances and iterables
(amongst which generators) as possible return values. The boolean
return value tells the handler whether something was effectively
published, it is used by the handler as such :

    if (not publish_object(req, object)) and (req.bytes_sent==0) and
(req.next is None):
        req.log_error("mod_python.publisher: nothing to publish.")
        return apache.HTTP_INTERNAL_SERVER_ERROR
    else:
        return apache.OK

See the enclosed file for the whole thing. Comments are of course welcome.

Regards,

Nicolas

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
A few issues, but I will tackle them one at a time in separate emails 
as I get
to them. :-)

On 30/04/2005, at 6:11 PM, Nicolas Lehuen wrote:

> # This regular expression is used to test for the presence of an HTML 
> header
> # tag, written in upper or lower case.
> re_html = re.compile(r"<HTML",re.I)

This isn't a reliable way of determining if content is HTML. Previously 
the test
was:

             if result[:100].strip()[:6].lower() == '<html>' \
                or result.find('</') > 0:

You have dropped the requirement for the closing '>' on the 'html' 
element,
which is probably a good thing to do, but you also dropped the check 
for '</'
anywhere in the content. Dropping this latter check will cause stuff 
that
was detected as HTML before, not to be detected now as HTML.

As an example, consider the start of the HTML from www.w3.org web site:

   <?xml version="1.0" encoding="utf-8"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" 
lang="en-US">
   ...
   </html>

The presence of the XML and DOCTYPE declarations puts the actual start 
of the
'html' tag at about character position 150. This is beyond the 100 
characters
that the code checks for the start of a 'html' element.

This would also have failed the check as it existed in original code, 
but in
the original code the check for '</' anywhere in the content would have 
then
kicked in and would have been matched at some point.

I am not saying that the original code is any better because of the 
potential
performance issues of scanning the full content in the worst case, but 
it did
work where proposed code could fail.

What might be better is to search backwards through the final part 
(maybe 100
characters) of the content for the string '</html', rather than 
searching from
the start. This is because all the DOCTYPE declarations mean there can 
be a
lot of leading crap, whereas don't think that there can be anything 
validly
trailing the final closing of the 'html' element.

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
>         if object is None:
>             return False

I really don't understand why a published function returning None 
should result
in a HTTP 500 internal server error response if no output content had 
been
explicitly written back via the request object. At least this is what 
currently
happens and am assuming that the above code is continuing that 
behaviour.

In "vampire::publisher" if a published function returns None or if 
access is
to a data variable which is set to None, I generate an empty string as a
response. It didn't make sense to me to be generating a HTTP 500 error, 
it
just seemed to confuse newbies who couldn't work out what they were 
doing
wrong. I figured they would learn quicker from getting an empty response
instead of a cryptic HTTP 500 error.

And yes, I am aware that you have since added logging of a message to 
indicate:

   req.log_error("mod_python.publisher: %s returned nothing." % `object`)

Newbies however don't necessarily go looking in the log. ;-)

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 10:24 PM, Nicolas Lehuen wrote:
> So, all this relies on the default platform encoding. How nice. The
> reason why you don't find sys.setdefaultencoding() is because this
> method is deleted from the module after the module is loaded,
> presumably to prevent developers to change the default encoding on the
> fly. I remember being mad at Python when I first discovered that (I
> was trying to remove this dumb 'ascii' default encoding).

It would only be removed though after the Python site setup file has 
been
executed. Ie., if I remember correctly, you can change the default 
setting
by adding a call to sys.setdefaultencoding() in the site file.

To me this suggests even more that adding a feature to mod_python which
would allow a directive in the Apache configuration file to set the
default encoding would be a good idea. As I said, it would have to be
at same scope as PythonImport and done when the interpreter is first
created but before any code is executed within the scope of the
interpreter.

If this was provided, it would certainly be a lot easier than having to
add something to the Python site file. Not even sure where that is.

Python FAQ has the following to say about this sort of stuff:

   It's possible to set a default encoding in a file called 
sitecustomize.py
   that's part of the Python library. However, this isn't recommended 
because
   changing the Python-wide default encoding may cause third-party 
extension
   modules to fail.

How much grief could we cause for third party modules by playing with 
it?
Remember that because it would be set by mod_python, only affecting 
stuff
running under mod_python and haven't destroyed anything else running on
the system.

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Wow. I'm working on another project right now which involves a C++
core with Python and Java mappings (using SWIG). I've just got
confused and assumed that the Java Native Interface behaviour of
exchanging string data in UTF8 format was also found in Python. Sorry.

So, all this relies on the default platform encoding. How nice. The
reason why you don't find sys.setdefaultencoding() is because this
method is deleted from the module after the module is loaded,
presumably to prevent developers to change the default encoding on the
fly. I remember being mad at Python when I first discovered that (I
was trying to remove this dumb 'ascii' default encoding).

This is one more reason NOT to let the system handle the writing of
unicode strings on the request output stream. The server's default
encoding could be any encoding, and there is no guarantee that this
encoding is good for the content you want to send. My example about
French accentuated still holds ; that's simple, if I want to return
u'café' on my computer, I get this :

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 3: ordinal not in range(128)

It's not very useful to be able to return unicode strings if the only
codepoints that are allowed are those that have a mapping in ASCII...

So we might as well drop the Unicode support and tell the developer to
handle the encoding himself, OR extract the desired encoding from the
Content-Type header and handle the encoding in the publisher.

Regards,

Nicolas


On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> 
> On 30/04/2005, at 9:40 PM, Nicolas Lehuen wrote:
> 
> > Graham, the encoding used by PyArg_ParseTuple is indeed UTF-8, whereas
> > str(unicode_string) uses the default encoding of the platform Python
> > is running on, which is unpredictable (for example, for years now
> > under win32 it has been ASCII even though there are ways to get the
> > default encoding specific to the current setup ; I suspect the
> > situation is not better on other platforms).
> >
> > Thus, if we removed the check for UnicodeType and simply did result =
> > str(object) for unicode string, we would have runtime exceptions,
> > because if the string contains accents, under win32, the default
> > encoder (ascii) will complain that it does not know how to encode
> > them.
> >
> > I'd rather have the developer choose explicitely the encoding he
> > wishes to use, with a default to UTF8, through the content-type
> > header.
> 
> Hmmm, getting confusing. :-(
> 
> The code says:
> 
>      if (encoding == NULL)
>          encoding = PyUnicode_GetDefaultEncoding();
> 
>      /* Shortcuts for common default encodings */
>      if (errors == NULL) {
>          if (strcmp(encoding, "utf-8") == 0)
>              return PyUnicode_AsUTF8String(unicode);
>          else if (strcmp(encoding, "latin-1") == 0)
>              return PyUnicode_AsLatin1String(unicode);
> #if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T)
>          else if (strcmp(encoding, "mbcs") == 0)
>              return PyUnicode_AsMBCSString(unicode);
> #endif
>          else if (strcmp(encoding, "ascii") == 0)
>              return PyUnicode_AsASCIIString(unicode);
>      }
> 
>      /* Encode via the codec registry */
>      v = PyCodec_Encode(unicode, encoding, errors);
> 
> Thus default doesn't seem to be UTF-8 but is what ever the default
> encoding is as would be used by str().
> 
> Maybe mod_python should have an Apache configuration file option which
> allows you to set the default encoding. Internally it could call:
> 
>    PyUnicode_SetDefaultEncoding()
> 
> The option would only be able to be set outside of any <Directory> or
> other directives. Ie., same level as PythonImport. If the option is
> not set, mod_python could forcibly set it to something which makes
> more sense in a web environment and would cause less problems. For
> example, could set it to "UTF-8" if that works better.
> 
> Only thing I am not sure about is at what version of Python this
> function was introduced. Am a bit confused that my Python 2.3 on
> Mac OS X doesn't have sys.setdefaultencoding() yet in the Python 2.3.4
> source code I have, it is present. I presume that the underlying C
> function would still be there though.
> 
> Graham
> 
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 9:40 PM, Nicolas Lehuen wrote:

> Graham, the encoding used by PyArg_ParseTuple is indeed UTF-8, whereas
> str(unicode_string) uses the default encoding of the platform Python
> is running on, which is unpredictable (for example, for years now
> under win32 it has been ASCII even though there are ways to get the
> default encoding specific to the current setup ; I suspect the
> situation is not better on other platforms).
>
> Thus, if we removed the check for UnicodeType and simply did result =
> str(object) for unicode string, we would have runtime exceptions,
> because if the string contains accents, under win32, the default
> encoder (ascii) will complain that it does not know how to encode
> them.
>
> I'd rather have the developer choose explicitely the encoding he
> wishes to use, with a default to UTF8, through the content-type
> header.

Hmmm, getting confusing. :-(

The code says:

     if (encoding == NULL)
         encoding = PyUnicode_GetDefaultEncoding();

     /* Shortcuts for common default encodings */
     if (errors == NULL) {
         if (strcmp(encoding, "utf-8") == 0)
             return PyUnicode_AsUTF8String(unicode);
         else if (strcmp(encoding, "latin-1") == 0)
             return PyUnicode_AsLatin1String(unicode);
#if defined(MS_WINDOWS) && defined(HAVE_USABLE_WCHAR_T)
         else if (strcmp(encoding, "mbcs") == 0)
             return PyUnicode_AsMBCSString(unicode);
#endif
         else if (strcmp(encoding, "ascii") == 0)
             return PyUnicode_AsASCIIString(unicode);
     }

     /* Encode via the codec registry */
     v = PyCodec_Encode(unicode, encoding, errors);

Thus default doesn't seem to be UTF-8 but is what ever the default
encoding is as would be used by str().

Maybe mod_python should have an Apache configuration file option which
allows you to set the default encoding. Internally it could call:

   PyUnicode_SetDefaultEncoding()

The option would only be able to be set outside of any <Directory> or
other directives. Ie., same level as PythonImport. If the option is
not set, mod_python could forcibly set it to something which makes
more sense in a web environment and would cause less problems. For
example, could set it to "UTF-8" if that works better.

Only thing I am not sure about is at what version of Python this
function was introduced. Am a bit confused that my Python 2.3 on
Mac OS X doesn't have sys.setdefaultencoding() yet in the Python 2.3.4
source code I have, it is present. I presume that the underlying C
function would still be there though.

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Graham, the encoding used by PyArg_ParseTuple is indeed UTF-8, whereas
str(unicode_string) uses the default encoding of the platform Python
is running on, which is unpredictable (for example, for years now
under win32 it has been ASCII even though there are ways to get the
default encoding specific to the current setup ; I suspect the
situation is not better on other platforms).

Thus, if we removed the check for UnicodeType and simply did result =
str(object) for unicode string, we would have runtime exceptions,
because if the string contains accents, under win32, the default
encoder (ascii) will complain that it does not know how to encode
them.

I'd rather have the developer choose explicitely the encoding he
wishes to use, with a default to UTF8, through the content-type
header.

Regards,
Nicolas

On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> 
> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
> >         elif isinstance(object,UnicodeType):
> >             # TODO : this skips all encoding issues, which is VERY BAD
> >             # I don't even understand how the req.write below can work
> > !
> >             result = object
> >         else:
> >             result = str(object)
> 
> What do you see is the issue that required an explicit check for
> UnicodeType
> and avoidance of converting it with str().
> 
> As the code is above, req.write() will be called with the
> UnicodeObject. This
> will work provided that the Unicode string can be converted into a
> normal
> string using the default encoding. Ie., in underlying C code
> PyArg_ParseTuple
> will use "s", meaning:
> 
> "s" (string or Unicode object) [char *]
>    Convert a Python string or Unicode object to a C pointer to a
> character
>    string. You must not provide storage for the string itself; a pointer
>    to an existing string is stored into the character pointer variable
>    whose address you pass. The C string is null-terminated. The Python
>    string must not contain embedded null bytes; if it does, a TypeError
>    exception is raised. Unicode objects are converted to C strings using
>    the default encoding. If this conversion fails, an UnicodeError is
> raised.
> 
> I think though that applying str() in the Python code to the Unicode
> string
> probably yields the same result. Ie., str(u'123') results in encode()
> method
> of Unicode string object being called.
> 
> S.encode([encoding[,errors]]) -> string
> 
> Return an encoded string version of S. Default encoding is the current
> default string encoding. errors may be given to set a different error
> handling scheme. Default is 'strict' meaning that encoding errors raise
> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
> 'xmlcharrefreplace' as well as any other name registered with
> codecs.register_error that can handle UnicodeEncodeErrors.
> 
> In other words, I don't believe there is any difference between
> converting
> it using str() before the call to req.write() as there is passing
> Unicode
> string direct to req.write(). Thus, explicit check for UnicodeType
> probably
> not required.
> 
> Graham
> 
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 9:48 PM, Nicolas Lehuen wrote:
>
> Mmmh. It seems that __str__() should only return str instances. At
> least that's how I understand it from the fact that there is a
> __unicode__ special method in the object class (from the Python
> documentation) :
>
> __unicode__( self)
> Called to implement unicode() builtin; should return a Unicode object.
> When this method is not defined, string conversion is attempted, and
> the result of string conversion is converted to Unicode using the
> system default encoding.

This is just opening up a bigger can of worms. Where we currently have:

   result = str(object)

should it instead be:

   result = unicode(object)

If one was going to properly make the whole system Unicode capable, this
is probably what you would do. If an object doesn't define __unicode__()
it will call __str__() anyway and then convert that to a Unicode string.

At the moment, if something defines __unicode__() it gets completely
ignored.

Maybe this whole area of Unicode support should also be deferred to the
release after this one, although I might have a play with it in my
"vampire::publisher" code if I get a chance. :-)

Graham



Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> You must be getting sick of me picking apart everything you put up. :-)

Well, no, it's exactly why I submit ideas to the list :)
 
> Problem I see with this is that it wouldn't be getting applied in all
> cases where Unicode strings would be getting returned. Imagine:
> 
>    class _Object:
>      def __str__(self):
>        return u'123'
> 
>    object = _Object()
> 
> The "object" variable isn't a Unicode string and if "object" is
> accessed,
> then str() gets applied to it and __str__() will return a Unicode
> string.
> This therefore bypasses your attempt to convert it using the appropriate
> encoding.

Mmmh. It seems that __str__() should only return str instances. At
least that's how I understand it from the fact that there is a
__unicode__ special method in the object class (from the Python
documentation) :

__unicode__( self) 
Called to implement unicode() builtin; should return a Unicode object.
When this method is not defined, string conversion is attempted, and
the result of string conversion is converted to Unicode using the
system default encoding.

That's a tricky one, I think we should ask the Python community about this.

> Interesting that in this case the Unicode string also gets delivered
> direct
> to req.write() as well.
> 
> I would suggest that the whole encoding issue be left up to the
> developer
> to handle rather than trying to be smart about it and make it automatic.
> The developer is going to know what they want, where as we would be
> making
> assumptions and could get it wrong.
> 
> Graham

Why not, but we should forbid people from returning Unicode strings,
then, because they rely on an undocumented behaviour of the publisher
that could change later.

Regards,
Nicolas

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
You must be getting sick of me picking apart everything you put up. :-)

Problem I see with this is that it wouldn't be getting applied in all
cases where Unicode strings would be getting returned. Imagine:

   class _Object:
     def __str__(self):
       return u'123'

   object = _Object()

The "object" variable isn't a Unicode string and if "object" is 
accessed,
then str() gets applied to it and __str__() will return a Unicode 
string.
This therefore bypasses your attempt to convert it using the appropriate
encoding.

Interesting that in this case the Unicode string also gets delivered 
direct
to req.write() as well.

I would suggest that the whole encoding issue be left up to the 
developer
to handle rather than trying to be smart about it and make it automatic.
The developer is going to know what they want, where as we would be 
making
assumptions and could get it wrong.

Graham

On 30/04/2005, at 9:28 PM, Nicolas Lehuen wrote:

> I think in this case the default conversion used is UTF8. Ideally, a
> developer returning Unicode strings from functions should have a way
> to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
> should be returned to the client.
>
> One possible way to do that would be to parse the content-type header,
> i.e. if the developer set the content type header to "text/html;
> charset=iso-8859-1", then we know the developer expect the result to
> be encoded in iso-8859-1, so we can do result =
> object.encode('iso-8859-1').
>
> Here is some tentative code for this :
>
> re_charset = re.compile(r"charset\s*=\s*([^\s;]+)");
>
> def publish_object(req, object):
>     if callable(object):
>         req.form = util.FieldStorage(req, keep_blank_values=1)
>         return publish_object(req,util.apply_fs_data(object, req.form, 
> req=req))
>     elif hasattr(object,'__iter__'):
>         result = False
>         for item in object:
>             result |= publish_object(req,item)
>         return result
>     else:
>         if object is None:
>             return False
>         elif isinstance(object,UnicodeType):
>             # We try to detect the character encoding
>             # from the Content-Type header
>             if req._content_type_set:
>                 charset = re_charset.search(req.content_type)
>                 if charset:
>                     charset = charset.group(1)
>                 else:
>                     charset = 'UTF8'
>                     req.content_type += '; charset=UTF8'
>             else:
>                 charset = 'UTF8'
>
>             result = object.encode(charset)
>         else:
>             result = str(object)
>
>     [...]
>
> Regards,
> Nicolas
>
>
> On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
>>
>> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
>>>         elif isinstance(object,UnicodeType):
>>>             # TODO : this skips all encoding issues, which is VERY 
>>> BAD
>>>             # I don't even understand how the req.write below can 
>>> work
>>> !
>>>             result = object
>>>         else:
>>>             result = str(object)
>>
>> What do you see is the issue that required an explicit check for
>> UnicodeType
>> and avoidance of converting it with str().
>>
>> As the code is above, req.write() will be called with the
>> UnicodeObject. This
>> will work provided that the Unicode string can be converted into a
>> normal
>> string using the default encoding. Ie., in underlying C code
>> PyArg_ParseTuple
>> will use "s", meaning:
>>
>> "s" (string or Unicode object) [char *]
>>    Convert a Python string or Unicode object to a C pointer to a
>> character
>>    string. You must not provide storage for the string itself; a 
>> pointer
>>    to an existing string is stored into the character pointer variable
>>    whose address you pass. The C string is null-terminated. The Python
>>    string must not contain embedded null bytes; if it does, a 
>> TypeError
>>    exception is raised. Unicode objects are converted to C strings 
>> using
>>    the default encoding. If this conversion fails, an UnicodeError is
>> raised.
>>
>> I think though that applying str() in the Python code to the Unicode
>> string
>> probably yields the same result. Ie., str(u'123') results in encode()
>> method
>> of Unicode string object being called.
>>
>> S.encode([encoding[,errors]]) -> string
>>
>> Return an encoded string version of S. Default encoding is the current
>> default string encoding. errors may be given to set a different error
>> handling scheme. Default is 'strict' meaning that encoding errors 
>> raise
>> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' 
>> and
>> 'xmlcharrefreplace' as well as any other name registered with
>> codecs.register_error that can handle UnicodeEncodeErrors.
>>
>> In other words, I don't believe there is any difference between
>> converting
>> it using str() before the call to req.write() as there is passing
>> Unicode
>> string direct to req.write(). Thus, explicit check for UnicodeType
>> probably
>> not required.
>>
>> Graham
>>
>>


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 01/05/2005, at 9:07 PM, Nicolas Lehuen wrote:

> Graham, have you seen the standard inspect.getargspec() function ? It
> exists since Python 2.1 and may save some code and portability :
>
> http://www.python.org/doc/2.2.3/lib/inspect-classes-functions.html

Yes. I probably didn't use it out of habit. Specifically, when I wrote 
my
other Python project where I needed to interrogate arguments, I had to 
be
compatible with Python 1.5 and Python 2.0, so initially didn't exist and
then couldn't use it for compatibility reasons.

Possibly about time to start using it, although not sure it will change
much the amount of code that needs to be written as still need special
cases to drop off self parameter of class instance method. It also does
not eliminate the chain of if statements which identify the actual 
object
you need to apply the interrogation to. :-)

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Graham, have you seen the standard inspect.getargspec() function ? It
exists since Python 2.1 and may save some code and portability :

http://www.python.org/doc/2.2.3/lib/inspect-classes-functions.html

Regards,
Nicolas

On 5/1/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> 
> On 01/05/2005, at 8:05 PM, Nicolas Lehuen wrote:
> 
> > Here is util.py with the modified line. I've tested it and it works.
> > The only problem is that it forces the developer to write a __init__
> > method, even though it only contains "pass".
> 
> You can work around the requirement to have "__init__()", you just need
> to
> restructure the code a bit and move interrogation of co_flags into the
> actual if statement. I actually have a distinct routine for calculating
> args and other stuff. See code for it attached.
> 
> 
> 
> 
> 
> 
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 01/05/2005, at 8:05 PM, Nicolas Lehuen wrote:

> Here is util.py with the modified line. I've tested it and it works.
> The only problem is that it forces the developer to write a __init__
> method, even though it only contains "pass".

You can work around the requirement to have "__init__()", you just need 
to
restructure the code a bit and move interrogation of co_flags into the
actual if statement. I actually have a distinct routine for calculating
args and other stuff. See code for it attached.


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Here is util.py with the modified line. I've tested it and it works.
The only problem is that it forces the developer to write a __init__
method, even though it only contains "pass".

I've modified the regular expression for the HTML closing tag in the
new branch. BTW, it was a bit scary to make this branch, at one point
I made a mistake and thought I had messed up the entire Apache
repository. Turns out I don't have this much power, fortunately ;).

If you want to check out this branch, here it is :

https://svn.apache.org/repos/asf/httpd/mod_python/branches/3.2.0-experimental-publisher

Regards,

Nicolas

On 5/1/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> 
> On 01/05/2005, at 7:32 PM, Nicolas Lehuen wrote:
> > I just had to modify apply_fs_data in util.py to allow new-style
> > classes to be called (changing the test type(object) = ClassType to
> > type(object) in (TypeType, ClassType)), and to rewrite publisher.py as
> > enclosed.
> 
> Can you post your modified apply_fs_data. Want to see if what you came
> up with is similar to mine or whether one or the other of us forgot some
> strange case.
> 
> In respect of:
> 
>    re_html = re.compile(r"</HTML",re.I)
> 
> Was thinking that maybe a better pattern might be:
> 
>    re_html = re.compile(r"</HTML\s*>\s*$",re.I)
> 
> Ie., explicitly require that </html> is the very last thing in the
> content.
> 
> If I am right that nothing can come after </html> this would be the most
> accurate thing to use and would save wrongly calling something HTML
> when it is not, like this message would be if I stick </html> here. :-)
> 
> Graham
> 
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 01/05/2005, at 7:32 PM, Nicolas Lehuen wrote:
> I just had to modify apply_fs_data in util.py to allow new-style
> classes to be called (changing the test type(object) = ClassType to
> type(object) in (TypeType, ClassType)), and to rewrite publisher.py as
> enclosed.

Can you post your modified apply_fs_data. Want to see if what you came
up with is similar to mine or whether one or the other of us forgot some
strange case.

In respect of:

   re_html = re.compile(r"</HTML",re.I)

Was thinking that maybe a better pattern might be:

   re_html = re.compile(r"</HTML\s*>\s*$",re.I)

Ie., explicitly require that </html> is the very last thing in the 
content.

If I am right that nothing can come after </html> this would be the most
accurate thing to use and would save wrongly calling something HTML
when it is not, like this message would be if I stick </html> here. :-)

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
OK, I now have a proper implementation of my proposals, working with
iterables, generators, new-style classes and old-style classes.
Generators are especially fun, they allow you to write things like :

def index(req):
    yield '<html><body>'
    yield greetings(req,'Nicolas')
    yield '</body></html>'

def greetings(req,first_name):
    yield '<p>Hello, '
    yield first_name
    yield '!</p>'

I just had to modify apply_fs_data in util.py to allow new-style
classes to be called (changing the test type(object) = ClassType to
type(object) in (TypeType, ClassType)), and to rewrite publisher.py as
enclosed.

All unit tests run OK. The only problem, like you wrote, Graham, is
that we have to change the traversal & publishing rules to allow
old-style classes and new-style classes publishing. This may induce
some problems if some developers thought their classes were safe from
publishing. Ah, well, I guess we could only allow this for power
users, I'll have a look at how we could add a PythonOption for this.

For now, I've kept the current traversal & publishing rules, so this
should not break anything. But as we are trying to make a 3.2 release,
I'm not going to put this on the trunk. I'm building a branch named
"experimental-publisher", and we'll merge it after the 3.2 release.

Regards,
Nicolas

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Well, the Content-Encoding header is for HTTP content encoding, as
explained in the RFC :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

For example, the server can gzip the result page and set
Content-Encoding to gzip. This is not related to the character
encoding, which is given as the charset parameter of the Content-Type
header.

Regards,
Nicolas

On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> Not too start this up again, but just for the record, wanted to add
> that there
> is the req.content_encoding encoding member. Not sure how this comes
> into the
> picture at all. Something to consider later. :-)
> 
> Graham
> 
> On 30/04/2005, at 9:35 PM, Nicolas Lehuen wrote:
> 
> > [Woops, forgot to put the list in the recipients.]
> >
> > I think in this case the default conversion used is UTF8. Ideally, a
> > developer returning Unicode strings from functions should have a way
> > to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
> > should be returned to the client.
> >
> > One possible way to do that would be to parse the content-type header,
> > i.e. if the developer set the content type header to "text/html;
> > charset=iso-8859-1", then we know the developer expect the result to
> > be encoded in iso-8859-1, so we can do result =
> > object.encode('iso-8859-1').
> >
> > Here is some tentative code for this :
> >
> > re_charset = re.compile(r"charset\s*=\s*([^\s;]+)");
> >
> > def publish_object(req, object):
> >     if callable(object):
> >         req.form = util.FieldStorage(req, keep_blank_values=1)
> >         return publish_object(req,util.apply_fs_data(object, req.form,
> > req=req))
> >     elif hasattr(object,'__iter__'):
> >         result = False
> >         for item in object:
> >             result |= publish_object(req,item)
> >         return result
> >     else:
> >         if object is None:
> >             return False
> >         elif isinstance(object,UnicodeType):
> >             # We try to detect the character encoding
> >             # from the Content-Type header
> >             if req._content_type_set:
> >                 charset = re_charset.search(req.content_type)
> >                 if charset:
> >                     charset = charset.group(1)
> >                 else:
> >                     charset = 'UTF8'
> >                     req.content_type += '; charset=UTF8'
> >             else:
> >                 charset = 'UTF8'
> >
> >             result = object.encode(charset)
> >         else:
> >             result = str(object)
> >
> >     [...]
> >
> > Regards,
> > Nicolas
> >
> > On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
> >>
> >> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
> >>>         elif isinstance(object,UnicodeType):
> >>>             # TODO : this skips all encoding issues, which is VERY
> >>> BAD
> >>>             # I don't even understand how the req.write below can
> >>> work
> >>> !
> >>>             result = object
> >>>         else:
> >>>             result = str(object)
> >>
> >> What do you see is the issue that required an explicit check for
> >> UnicodeType
> >> and avoidance of converting it with str().
> >>
> >> As the code is above, req.write() will be called with the
> >> UnicodeObject. This
> >> will work provided that the Unicode string can be converted into a
> >> normal
> >> string using the default encoding. Ie., in underlying C code
> >> PyArg_ParseTuple
> >> will use "s", meaning:
> >>
> >> "s" (string or Unicode object) [char *]
> >>    Convert a Python string or Unicode object to a C pointer to a
> >> character
> >>    string. You must not provide storage for the string itself; a
> >> pointer
> >>    to an existing string is stored into the character pointer variable
> >>    whose address you pass. The C string is null-terminated. The Python
> >>    string must not contain embedded null bytes; if it does, a
> >> TypeError
> >>    exception is raised. Unicode objects are converted to C strings
> >> using
> >>    the default encoding. If this conversion fails, an UnicodeError is
> >> raised.
> >>
> >> I think though that applying str() in the Python code to the Unicode
> >> string
> >> probably yields the same result. Ie., str(u'123') results in encode()
> >> method
> >> of Unicode string object being called.
> >>
> >> S.encode([encoding[,errors]]) -> string
> >>
> >> Return an encoded string version of S. Default encoding is the current
> >> default string encoding. errors may be given to set a different error
> >> handling scheme. Default is 'strict' meaning that encoding errors
> >> raise
> >> a UnicodeEncodeError. Other possible values are 'ignore', 'replace'
> >> and
> >> 'xmlcharrefreplace' as well as any other name registered with
> >> codecs.register_error that can handle UnicodeEncodeErrors.
> >>
> >> In other words, I don't believe there is any difference between
> >> converting
> >> it using str() before the call to req.write() as there is passing
> >> Unicode
> >> string direct to req.write(). Thus, explicit check for UnicodeType
> >> probably
> >> not required.
> >>
> >> Graham
> >>
> >>
> 
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
Not too start this up again, but just for the record, wanted to add 
that there
is the req.content_encoding encoding member. Not sure how this comes 
into the
picture at all. Something to consider later. :-)

Graham

On 30/04/2005, at 9:35 PM, Nicolas Lehuen wrote:

> [Woops, forgot to put the list in the recipients.]
>
> I think in this case the default conversion used is UTF8. Ideally, a
> developer returning Unicode strings from functions should have a way
> to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
> should be returned to the client.
>
> One possible way to do that would be to parse the content-type header,
> i.e. if the developer set the content type header to "text/html;
> charset=iso-8859-1", then we know the developer expect the result to
> be encoded in iso-8859-1, so we can do result =
> object.encode('iso-8859-1').
>
> Here is some tentative code for this :
>
> re_charset = re.compile(r"charset\s*=\s*([^\s;]+)");
>
> def publish_object(req, object):
>     if callable(object):
>         req.form = util.FieldStorage(req, keep_blank_values=1)
>         return publish_object(req,util.apply_fs_data(object, req.form, 
> req=req))
>     elif hasattr(object,'__iter__'):
>         result = False
>         for item in object:
>             result |= publish_object(req,item)
>         return result
>     else:
>         if object is None:
>             return False
>         elif isinstance(object,UnicodeType):
>             # We try to detect the character encoding
>             # from the Content-Type header
>             if req._content_type_set:
>                 charset = re_charset.search(req.content_type)
>                 if charset:
>                     charset = charset.group(1)
>                 else:
>                     charset = 'UTF8'
>                     req.content_type += '; charset=UTF8'
>             else:
>                 charset = 'UTF8'
>
>             result = object.encode(charset)
>         else:
>             result = str(object)
>
>     [...]
>
> Regards,
> Nicolas
>
> On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
>>
>> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
>>>         elif isinstance(object,UnicodeType):
>>>             # TODO : this skips all encoding issues, which is VERY 
>>> BAD
>>>             # I don't even understand how the req.write below can 
>>> work
>>> !
>>>             result = object
>>>         else:
>>>             result = str(object)
>>
>> What do you see is the issue that required an explicit check for
>> UnicodeType
>> and avoidance of converting it with str().
>>
>> As the code is above, req.write() will be called with the
>> UnicodeObject. This
>> will work provided that the Unicode string can be converted into a
>> normal
>> string using the default encoding. Ie., in underlying C code
>> PyArg_ParseTuple
>> will use "s", meaning:
>>
>> "s" (string or Unicode object) [char *]
>>    Convert a Python string or Unicode object to a C pointer to a
>> character
>>    string. You must not provide storage for the string itself; a 
>> pointer
>>    to an existing string is stored into the character pointer variable
>>    whose address you pass. The C string is null-terminated. The Python
>>    string must not contain embedded null bytes; if it does, a 
>> TypeError
>>    exception is raised. Unicode objects are converted to C strings 
>> using
>>    the default encoding. If this conversion fails, an UnicodeError is
>> raised.
>>
>> I think though that applying str() in the Python code to the Unicode
>> string
>> probably yields the same result. Ie., str(u'123') results in encode()
>> method
>> of Unicode string object being called.
>>
>> S.encode([encoding[,errors]]) -> string
>>
>> Return an encoded string version of S. Default encoding is the current
>> default string encoding. errors may be given to set a different error
>> handling scheme. Default is 'strict' meaning that encoding errors 
>> raise
>> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' 
>> and
>> 'xmlcharrefreplace' as well as any other name registered with
>> codecs.register_error that can handle UnicodeEncodeErrors.
>>
>> In other words, I don't believe there is any difference between
>> converting
>> it using str() before the call to req.write() as there is passing
>> Unicode
>> string direct to req.write(). Thus, explicit check for UnicodeType
>> probably
>> not required.
>>
>> Graham
>>
>>


mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
[Woops, forgot to put the list in the recipients.]

I think in this case the default conversion used is UTF8. Ideally, a
developer returning Unicode strings from functions should have a way
to decide in what encoding (UTF-8, iso-latin-1, etc.) the string
should be returned to the client.

One possible way to do that would be to parse the content-type header,
i.e. if the developer set the content type header to "text/html;
charset=iso-8859-1", then we know the developer expect the result to
be encoded in iso-8859-1, so we can do result =
object.encode('iso-8859-1').

Here is some tentative code for this :

re_charset = re.compile(r"charset\s*=\s*([^\s;]+)");

def publish_object(req, object):
    if callable(object):
        req.form = util.FieldStorage(req, keep_blank_values=1)
        return publish_object(req,util.apply_fs_data(object, req.form, req=req))
    elif hasattr(object,'__iter__'):
        result = False
        for item in object:
            result |= publish_object(req,item)
        return result
    else:
        if object is None:
            return False
        elif isinstance(object,UnicodeType):
            # We try to detect the character encoding
            # from the Content-Type header
            if req._content_type_set:
                charset = re_charset.search(req.content_type)
                if charset:
                    charset = charset.group(1)
                else:
                    charset = 'UTF8'
                    req.content_type += '; charset=UTF8'
            else:
                charset = 'UTF8'

            result = object.encode(charset)
        else:
            result = str(object)

    [...]

Regards,
Nicolas

On 4/30/05, Graham Dumpleton <gr...@dscpl.com.au> wrote:
>
> On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
> >         elif isinstance(object,UnicodeType):
> >             # TODO : this skips all encoding issues, which is VERY BAD
> >             # I don't even understand how the req.write below can work
> > !
> >             result = object
> >         else:
> >             result = str(object)
>
> What do you see is the issue that required an explicit check for
> UnicodeType
> and avoidance of converting it with str().
>
> As the code is above, req.write() will be called with the
> UnicodeObject. This
> will work provided that the Unicode string can be converted into a
> normal
> string using the default encoding. Ie., in underlying C code
> PyArg_ParseTuple
> will use "s", meaning:
>
> "s" (string or Unicode object) [char *]
>    Convert a Python string or Unicode object to a C pointer to a
> character
>    string. You must not provide storage for the string itself; a pointer
>    to an existing string is stored into the character pointer variable
>    whose address you pass. The C string is null-terminated. The Python
>    string must not contain embedded null bytes; if it does, a TypeError
>    exception is raised. Unicode objects are converted to C strings using
>    the default encoding. If this conversion fails, an UnicodeError is
> raised.
>
> I think though that applying str() in the Python code to the Unicode
> string
> probably yields the same result. Ie., str(u'123') results in encode()
> method
> of Unicode string object being called.
>
> S.encode([encoding[,errors]]) -> string
>
> Return an encoded string version of S. Default encoding is the current
> default string encoding. errors may be given to set a different error
> handling scheme. Default is 'strict' meaning that encoding errors raise
> a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
> 'xmlcharrefreplace' as well as any other name registered with
> codecs.register_error that can handle UnicodeEncodeErrors.
>
> In other words, I don't believe there is any difference between
> converting
> it using str() before the call to req.write() as there is passing
> Unicode
> string direct to req.write(). Thus, explicit check for UnicodeType
> probably
> not required.
>
> Graham
>
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
I note you send something about new and old style classes just as I
was about to send this, so you may find my approach interesting or not.

On 30/04/2005, at 9:34 PM, Nicolas Lehuen wrote:

> BTW, the more I'm looking at this, the more I think we should keep
> this for mod_python 3.3 or something. I think we need to converge on
> mod_python 3.2 to make a release soon.
>
> I'm trying things for now so that we can discuss it, but I'm not going
> to integrate them into the subversion trunk, so that we can keep on
> preparing the 3.2 release. Maybe I'll create a branch for it and merge
> my changes back after the 3.2 release.

FWIW, I have added a new wrapper class into my working version of 
Vampire
which provides a mechanism to trigger the creation of an instance of a
class and for the class instance to then be called. Ie.,

   import vampire
   import time

   class OldStyleClass:

     def __init__(self):
       self.__time = time.time()

     def __call__(self,req,arg1=None,arg2=None):
       return self.__time,"OldStyleClass.__call__()",arg1,arg2

   old_instance = vampire.Instance(OldStyleClass)

   class NewStyleClass(object):

     def __init__(self):
       self.__time = time.time()

     def __call__(self,req,arg1=None,arg2=None):
       return self.__time,"NewStyleClass.__call__()",arg1,arg2

   new_instance = vampire.Instance(NewStyleClass)

It isn't recursive like you had planned. Ie., if "__call__()" returns an
object which is callable, it does not in turn call that. Mainly don't do
this as can't think of a practical purpose for it.

The actual code for me to implement this class was quite simple:

class Instance:

   def __init__(self,target):
     self.__target = target

   def __call__(self,req):
     if not type(self.__target) in [types.ClassType,types.TypeType]:
       raise apache.SERVER_RETURN, apache.HTTP_INTERNAL_SERVER_ERROR

     instance = _execute(req,self.__target,lazy=True)

     if not callable(instance):
       raise apache.SERVER_RETURN, apache.HTTP_INTERNAL_SERVER_ERROR

     return _execute(req,instance,lazy=True)

Although, did have to also add upgraded knowledge of ClassType and
TypeType to one of my utility methods so as to work out parameters
for constructor. I managed to work out how to get parameters for the
new style class case okay whereas before I had some issue with it.
Don't know why, it wasn't that big a problem, issue may have been
more to do with traversal rules.

My approach means that class types still will not be automatically be
able to created, thus existing code doesn't get broken. To have a class
automatically created, you have to make the conscious choice to wrap
it.

This class not only works in my "vampire::publisher" equivalent to
mod_python.publisher, but also in "vampire.Publisher()", my ability to
use published methods from a basic content handler with publisher
return semantics, plus in my "vampire.Handler()" class which is
published methods from a basic content handler but with standard
content handler return semantics.

This approach of using a wrapper class fits in better with how I have
previously implemented wrapper classes "vampire.PathInfo()" and
"vampire.PathArgs()" to allow use of additional path info when using
publisher.

What you have is a basic core which can be optionally extended in
specialised ways just at the place that you need it. At the moment
my method for applying a request to an object is private, but I
could expose it when I am happy, and that opens up the opportunity
for users to easily define their own wrappers to add specialised
features. For example, you idea of iterating over items accessible
from an object and executing each, could be done as a wrapper without
the need to modify the core.

One other wrapper class I have been thinking about for a while is a
pooling mechanism. This would be similar to "vampire.Instance()" but
would actually create a pool of the classes and cache them so they can
be reused on subsequent requests. There could be a maximum limit on
the number of instances allowed to be active with new requests having
to wait until previous requests had completed before being allowed to
proceed. Alternatively, you have a high water mark such that if a
burst of requests come in, you create new instances, but as they finish
you delete excess instances back down to the level of the high water
mark. Obviously, this pool is only pertinent to a threaded MPM.

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
BTW, the more I'm looking at this, the more I think we should keep
this for mod_python 3.3 or something. I think we need to converge on
mod_python 3.2 to make a release soon.

I'm trying things for now so that we can discuss it, but I'm not going
to integrate them into the subversion trunk, so that we can keep on
preparing the 3.2 release. Maybe I'll create a branch for it and merge
my changes back after the 3.2 release.

Regards,

Nicolas

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
>         elif isinstance(object,UnicodeType):
>             # TODO : this skips all encoding issues, which is VERY BAD
>             # I don't even understand how the req.write below can work 
> !
>             result = object
>         else:
>             result = str(object)

What do you see is the issue that required an explicit check for 
UnicodeType
and avoidance of converting it with str().

As the code is above, req.write() will be called with the 
UnicodeObject. This
will work provided that the Unicode string can be converted into a 
normal
string using the default encoding. Ie., in underlying C code 
PyArg_ParseTuple
will use "s", meaning:

"s" (string or Unicode object) [char *]
   Convert a Python string or Unicode object to a C pointer to a 
character
   string. You must not provide storage for the string itself; a pointer
   to an existing string is stored into the character pointer variable
   whose address you pass. The C string is null-terminated. The Python
   string must not contain embedded null bytes; if it does, a TypeError
   exception is raised. Unicode objects are converted to C strings using
   the default encoding. If this conversion fails, an UnicodeError is 
raised.

I think though that applying str() in the Python code to the Unicode 
string
probably yields the same result. Ie., str(u'123') results in encode() 
method
of Unicode string object being called.

S.encode([encoding[,errors]]) -> string

Return an encoded string version of S. Default encoding is the current
default string encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

In other words, I don't believe there is any difference between 
converting
it using str() before the call to req.write() as there is passing 
Unicode
string direct to req.write(). Thus, explicit check for UnicodeType 
probably
not required.

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 8:17 PM, Graham Dumpleton wrote:
> I'll probably have more to say about automatic creation of objects from
> class types later. I personally don't think it is a good idea. For one,
> adding this feature will mean that everyone will have to revisit their
> code and update it as suddenly all these classes which were benign 
> before,
> will suddenly be able to be targeted by requests if the names of the
> classes weren't prefixed with an underscore to hide them. There was 
> also
> one very specific problem that came up with making class types 
> publishable
> from when I looked at implementing it, but which I just can't remember
> at the moment. :-)

Okay, remembered what the problem was now. At the time I didn't solve 
it,
but having another look at it I perhaps see a solution now, although not
sure why I didn't work it out last time. The biggest stumbling block at
the time was new style classes.

Irrespective of whether or not I think it is a good idea, changes would
be need to util.apply_fs_data() to get it all to work.

First off, are old style classes. As with any class, it may or may not
provide a constructor, ie., "__init__()".

 >>> class OldStyle:
...   pass
...
 >>> dir(OldStyle)
['__doc__', '__module__']
 >>> type(OldStyle)
<type 'classobj'>
 >>> OldStyle.__init__
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
AttributeError: class OldStyle has no attribute '__init__'
 >>> OldStyle()
<__main__.OldStyle instance at 0x64b20>

 >>> class OldStyleCtor:
...   def __init__(self):
...     pass
...
 >>> dir(OldStyleCtor)
['__doc__', '__init__', '__module__']
 >>> type(OldStyleCtor)
<type 'classobj'>
 >>> OldStyleCtor.__init__
<unbound method OldStyleCtor.__init__>
 >>> OldStyleCtor.__init__.im_func.func_code
<code object __init__ at 0x612a0, file "<stdin>", line 2>
 >>> OldStyleCtor()
<__main__.OldStyleCtor instance at 0x64b20>

The current code in util.apply_fs_data() has:

    elif type(object) is ClassType:
        # class
        fc = object.__init__.im_func.func_code
        expected = fc.co_varnames[1:fc.co_argcount]

This will fail if the old style class doesn't define its own 
constructor.
Code would need to say:

    elif type(object) is ClassType:
        # class
        if hasattr(object,"__init__):
            fc = object.__init__.im_func.func_code
            expected = fc.co_varnames[1:fc.co_argcount]

Supporting new style classes are more problematic and where I had 
problems
last time.

Hmmm, I am going to bail out here, send this email but think more about 
the
new style class case. I am again starting to see problems in how to 
support
them, so I better play first and make sure I have an actual solution or 
not.
The problem lies in knowing whether or not you do actually have a new 
style
class or not and then how you determine what arguments a constructor is
wanting to accept. All gets muddled because of existing check for 
"__call__".

Graham


Re: mod_python.publisher : proposal for a few implementation changes

Posted by Graham Dumpleton <gr...@dscpl.com.au>.
On 30/04/2005, at 6:37 PM, Nicolas Lehuen wrote:
> def publish_object(req, object):
>     if callable(object):
>         req.form = util.FieldStorage(req, keep_blank_values=1)
>         return publish_object(req,util.apply_fs_data(object, req.form, 
> req=req))

This will not work where the object is a class object, where the class
overrides "__call__" and a POST request is used. Even when it is a GET
request for this case, multiple instances of FieldStorage class will be
created which will be inefficient. You also loose a bit in efficiency
in that form will be processed multiple times. As an example:

   class Object:
     def __call__(self,arg1,arg2):
       return "__call__"

When request resolves to Object type, call into publish_object() will
result in it detecting that class type is callable. It will create a
form object and attempt to call the object, which should result in an
instance of it being created. If it was a POST request, creation of the
form object will cause all the content to be consumed.

On the sub call to publish_object(), because of the existence of the
"__call__" method, it will a second time create a form object. Because
all the content has been consumed related to the POST request, it will
not find any form parameters and thus when it attempts to call the
instance itself, it will fail due to missing parameters.

What you would at least need to do is check to see if "req.form" already
existed before creating a form object. Thus:

         if not hasattr(req,"form"):
             req.form = util.FieldStorage(req, keep_blank_values=1)

In Vampire, not only does it cache req.form and always check to make
sure it hasn't already been created, it will cache the data derived
from initially passing req.form.list to avoid having to process it
twice. This may not be too pertinent to mod_python.publisher, but is
used by some of the features in Vampire.

I'll probably have more to say about automatic creation of objects from
class types later. I personally don't think it is a good idea. For one,
adding this feature will mean that everyone will have to revisit their
code and update it as suddenly all these classes which were benign 
before,
will suddenly be able to be targeted by requests if the names of the
classes weren't prefixed with an underscore to hide them. There was also
one very specific problem that came up with making class types 
publishable
from when I looked at implementing it, but which I just can't remember
at the moment. :-)

Graham




Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Plus, I made a mistake, the proper code is :

def publish_object(req, object):
    if callable(object):
        req.form = util.FieldStorage(req, keep_blank_values=1)
        return publish_object(req,util.apply_fs_data(object, req.form, req=req))
    elif hasattr(object,'__iter__'):
        result = False
        for item in object:
            result |= publish_object(req,item)
        return result
    else:
        if object is None:
            return False
        elif isinstance(object,UnicodeType):
            # TODO : this skips all encoding issues, which is VERY BAD
            # I don't even understand how the req.write below can work !
            result = object
        else:
            result = str(object)
            
        if not req._content_type_set:
            # make an attempt to guess content-type
            if re_html.search(result,0,100):
                req.content_type = 'text/html'
            else:
                req.content_type = 'text/plain'
        
        if req.method!='HEAD':
            req.write(result)

        return True

On 4/30/05, Nicolas Lehuen <ni...@gmail.com> wrote:
> Be aware that the test for __iter__ changes the behaviour of the
> publisher. Before, if you returned a list from a published function,
> it was turned into a string (using __str__) then sent to the client as
> text/plain :
> 
> def test(req):
>     return [1,2,3]
> 
> /test ==> "[1, 2, 3]"
> 
> With this proposal, as the list is iterable, we will get "123"
> instead. If you really want "[1, 2, 3]" then you have to "return
> str([1,2,3])".
> 
> This doesn't break anything in the unit tests, but if this is too much
> of a change, we could test for the presence of a "next" method and
> consider it's an iterator only if it is present. This way, "return
> [1,2,3]" would send "[1, 2, 3]" whereas "return iter([1,2,3])" would
> send "123".
> 
> Regards,
> Nicolas
> 
> On 4/30/05, Nicolas Lehuen <ni...@gmail.com> wrote:
> > Hi,
> >
> > I'm trying to solve both MODPYTHON-15
> > (http://issues.apache.org/jira/browse/MODPYTHON-15) and MODPYTHON-16
> > (http://issues.apache.org/jira/browse/MODPYTHON-16) in one strike...
> >
> > What do you think of this approach : the object returned by
> > resolve_object is then passed to this publish_object function :
> >
> > # This regular expression is used to test for the presence of an HTML header
> > # tag, written in upper or lower case.
> > re_html = re.compile(r"<HTML",re.I)
> >
> > def publish_object(req, object):
> >     if callable(object):
> >         req.form = util.FieldStorage(req, keep_blank_values=1)
> >         return publish_object(req,util.apply_fs_data(object, req.form, req=req))
> >     elif hasattr(object,'__iter__'):
> >         result = False
> >         for result in object:
> >             result |= publish_object(req,object)
> >         return result
> >     else:
> >         if object is None:
> >             return False
> >         elif isinstance(object,UnicodeType):
> >             # TODO : this skips all encoding issues, which is VERY BAD
> >             # I don't even understand how the req.write below can work !
> >             result = object
> >         else:
> >             result = str(object)
> >
> >         if not req._content_type_set:
> >             # make an attempt to guess content-type
> >             if re_html.search(result,0,100):
> >                 req.content_type = 'text/html'
> >             else:
> >                 req.content_type = 'text/plain'
> >
> >         if req.method!='HEAD':
> >             req.write(result)
> >
> >         return True
> >
> > This way, we could support classes, class instances and iterables
> > (amongst which generators) as possible return values. The boolean
> > return value tells the handler whether something was effectively
> > published, it is used by the handler as such :
> >
> >     if (not publish_object(req, object)) and (req.bytes_sent==0) and
> > (req.next is None):
> >         req.log_error("mod_python.publisher: nothing to publish.")
> >         return apache.HTTP_INTERNAL_SERVER_ERROR
> >     else:
> >         return apache.OK
> >
> > See the enclosed file for the whole thing. Comments are of course welcome.
> >
> > Regards,
> >
> > Nicolas
> >
> >
> >
>

Re: mod_python.publisher : proposal for a few implementation changes

Posted by Nicolas Lehuen <ni...@gmail.com>.
Be aware that the test for __iter__ changes the behaviour of the
publisher. Before, if you returned a list from a published function,
it was turned into a string (using __str__) then sent to the client as
text/plain :

def test(req):
    return [1,2,3]

/test ==> "[1, 2, 3]"

With this proposal, as the list is iterable, we will get "123"
instead. If you really want "[1, 2, 3]" then you have to "return
str([1,2,3])".

This doesn't break anything in the unit tests, but if this is too much
of a change, we could test for the presence of a "next" method and
consider it's an iterator only if it is present. This way, "return
[1,2,3]" would send "[1, 2, 3]" whereas "return iter([1,2,3])" would
send "123".

Regards,
Nicolas

On 4/30/05, Nicolas Lehuen <ni...@gmail.com> wrote:
> Hi,
> 
> I'm trying to solve both MODPYTHON-15
> (http://issues.apache.org/jira/browse/MODPYTHON-15) and MODPYTHON-16
> (http://issues.apache.org/jira/browse/MODPYTHON-16) in one strike...
> 
> What do you think of this approach : the object returned by
> resolve_object is then passed to this publish_object function :
> 
> # This regular expression is used to test for the presence of an HTML header
> # tag, written in upper or lower case.
> re_html = re.compile(r"<HTML",re.I)
> 
> def publish_object(req, object):
>     if callable(object):
>         req.form = util.FieldStorage(req, keep_blank_values=1)
>         return publish_object(req,util.apply_fs_data(object, req.form, req=req))
>     elif hasattr(object,'__iter__'):
>         result = False
>         for result in object:
>             result |= publish_object(req,object)
>         return result
>     else:
>         if object is None:
>             return False
>         elif isinstance(object,UnicodeType):
>             # TODO : this skips all encoding issues, which is VERY BAD
>             # I don't even understand how the req.write below can work !
>             result = object
>         else:
>             result = str(object)
> 
>         if not req._content_type_set:
>             # make an attempt to guess content-type
>             if re_html.search(result,0,100):
>                 req.content_type = 'text/html'
>             else:
>                 req.content_type = 'text/plain'
> 
>         if req.method!='HEAD':
>             req.write(result)
> 
>         return True
> 
> This way, we could support classes, class instances and iterables
> (amongst which generators) as possible return values. The boolean
> return value tells the handler whether something was effectively
> published, it is used by the handler as such :
> 
>     if (not publish_object(req, object)) and (req.bytes_sent==0) and
> (req.next is None):
>         req.log_error("mod_python.publisher: nothing to publish.")
>         return apache.HTTP_INTERNAL_SERVER_ERROR
>     else:
>         return apache.OK
> 
> See the enclosed file for the whole thing. Comments are of course welcome.
> 
> Regards,
> 
> Nicolas
> 
> 
>