You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mod_python-dev@quetz.apache.org by "Graham Dumpleton (JIRA)" <ji...@apache.org> on 2005/12/08 23:22:07 UTC

[jira] Created: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

mod_python.publisher iterables and content_type broken
------------------------------------------------------

         Key: MODPYTHON-97
         URL: http://issues.apache.org/jira/browse/MODPYTHON-97
     Project: mod_python
        Type: Bug
  Components: publisher  
    Versions: 3.2    
    Reporter: Graham Dumpleton


In 3.2, mod_python.publisher was modified so that if it encountered an interable it would recursively iterate over the items and publish each with the result being concatenated.

FWIW, I personally didn't like this as I saw it potentially changing the behaviour of existing code, although perhaps in contrived cases or for test code only. I saw that this sort of behaviour should have been managed by the user by explicit use of a wrapper class instead, rather than it being magic. End of ramble. :-)

Regardless of my concerns, the behaviour that was added is broken. Specifically, mod_python.publisher is setting the content type based on the content of the first item returned from the iterable. For example, consider the following:

index = [
  '<html><body><p>',
  1000 * "X",
  '</p></body></html>',
]

When published, this is resulting in the content type being set to 'text/plain' and not 'text/html'. In part this is perhaps caused by the fact that the content type check is now performed by looking for a trailing '</html>' in the content whereas previously it would look for a leading '<html>'. This was changed because all the HTML prologue that can appear before '<html>' would often throw out this check with the HTML not being automatically being detected. Thus at the time it was thought that looking for the trailing '</html>' would be more reliable. It ain't going to help to go back to using a leading '<html>' check though as the first item may only contain the prologue and not '<html>'.

These checks are only going to work for iterables if the results of publishing of each item were added to the end of a list of strings, rather than being written back immediately using req.write(). Once all that has been returned by the iterable is obtained, this can all be joined back together and then the HTML check done.

Joining all the separate items returned from the iterable back together defeats the purpose of what this feature was about in the first place and may result in huge in memory objects needing to be created to hold the combined result just so the HTML check can be done.

The only way to avoid the problem is for the content type to be set explicitly by the user before the iterable is processed. This is a bit tricky as it is mod_python.publisher which is automagically doing this. The best you can do is something like:

class SetContentType:
  def __init__(self,content_type):
    self.__content_type = content_type
  def __call__(self,req):
    req.content_type = self.__content_type
    return ""

index = [
  SetContentType('text/html'),
  '<html><body><p>',
  1000 * "X",
  '</p></body></html>',
]

Once you start doing this, the user may as well have provided their own published function in the first place that set the content type and manually iterated over items and wrote them to req.write(). This could also be managed by a user specified wrapper class which is how I saw this as preferably being done in the first place. Ie.,

class PublishIterable:
  def __init__(self,value,content_type):
    self.__value = value
    self.__content_type = content_type
  def __call__(self,req):
    req.content_type = self.__content_type
    for item in self.__value:
      req.write(item)

_values = [
  '<html><body><p>',
  1000 * "X",
  '</p></body></html>',
]

index = PublishIterable(_values,'text/html')

Personally I believe this automagic publishing of iterables should be removed from mod_python.publisher. You might still provide a special function/wrapper that works like PublisherIterable but handles recursive structures and callable objects in the process, but I feel it should be up to the user to make a conscious effort to use it and mod_python.publisher shouldn't assume that it should process any iterable in this way automatically.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Created: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.
On Thu, 8 Dec 2005, Gregory (Grisha) Trubetskoy wrote:

>  def index(req)
>      req.content_type = 'text/plain'
>      yield '1\n'
>      yield '2\n'
>      yield '3\n'
>      yield '4\n'
>
>  When published, this module should return a text content with
> '1\n2\n3\n4\n'.
>
> ---
>
> I'd expect this to return '1\n'. Because in my mind one HTTP request 
> corresponds to one call to the published function?

Actually, I take that back. It should return something more along the 
lines of '<generator object at 0x81ba5ec>'.

Does this make sense?

Grisha

Re: [jira] Created: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.
On Thu, 8 Dec 2005, Jim Gallacher wrote:

> I believe this is the source of this change:
> http://issues.apache.org/jira/browse/MODPYTHON-15

Thanks! So quoting from the description:

---

   Suppose this function in a published module :

   def index(req)
       req.content_type = 'text/plain'
       yield '1\n'
       yield '2\n'
       yield '3\n'
       yield '4\n'

   When published, this module should return a text content with
  '1\n2\n3\n4\n'.

---

I'd expect this to return '1\n'. Because in my mind one HTTP request 
corresponds to one call to the published function?

(Also I guess a subsequent call which happens to hit the same httpd 
process would return '2\n', etc.)

Grisha

Re: [jira] Created: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by Jim Gallacher <jp...@jgassociates.ca>.
Gregory (Grisha) Trubetskoy wrote:
> 
> On Thu, 8 Dec 2005, Graham Dumpleton (JIRA) wrote:
> 
>> In 3.2, mod_python.publisher was modified so that if it encountered an 
>> interable it would recursively iterate over the items and publish each 
>> with the result being concatenated.
> 
> 
> I didn't know this either. I'm curious what the rational for this was - 
> I'm not sure whether this is good or bad. If I have an object which it 
> iterable, but also has its own idea of how to represent itself as a 
> string as a whole, isn't this going to be slightly unexpected?
> 
> Grisha

I believe this is the source of this change:
http://issues.apache.org/jira/browse/MODPYTHON-15

Jim

Re: [jira] Created: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.
On Thu, 8 Dec 2005, Graham Dumpleton (JIRA) wrote:

> In 3.2, mod_python.publisher was modified so that if it encountered an 
> interable it would recursively iterate over the items and publish each 
> with the result being concatenated.

I didn't know this either. I'm curious what the rational for this was - 
I'm not sure whether this is good or bad. If I have an object which it 
iterable, but also has its own idea of how to represent itself as a string 
as a whole, isn't this going to be slightly unexpected?

Grisha

[jira] Resolved: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by "Nicolas Lehuen (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/MODPYTHON-97?page=all ]
     
Nicolas Lehuen resolved MODPYTHON-97:
-------------------------------------

    Fix Version: 3.2
     Resolution: Fixed
      Assign To: Nicolas Lehuen

Fixed this by reverting the changes from MODPYTHON-15.

> mod_python.publisher iterables and content_type broken
> ------------------------------------------------------
>
>          Key: MODPYTHON-97
>          URL: http://issues.apache.org/jira/browse/MODPYTHON-97
>      Project: mod_python
>         Type: Bug
>   Components: publisher
>     Versions: 3.2
>     Reporter: Graham Dumpleton
>     Assignee: Nicolas Lehuen
>      Fix For: 3.2

>
> In 3.2, mod_python.publisher was modified so that if it encountered an interable it would recursively iterate over the items and publish each with the result being concatenated.
> FWIW, I personally didn't like this as I saw it potentially changing the behaviour of existing code, although perhaps in contrived cases or for test code only. I saw that this sort of behaviour should have been managed by the user by explicit use of a wrapper class instead, rather than it being magic. End of ramble. :-)
> Regardless of my concerns, the behaviour that was added is broken. Specifically, mod_python.publisher is setting the content type based on the content of the first item returned from the iterable. For example, consider the following:
> index = [
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> When published, this is resulting in the content type being set to 'text/plain' and not 'text/html'. In part this is perhaps caused by the fact that the content type check is now performed by looking for a trailing '</html>' in the content whereas previously it would look for a leading '<html>'. This was changed because all the HTML prologue that can appear before '<html>' would often throw out this check with the HTML not being automatically being detected. Thus at the time it was thought that looking for the trailing '</html>' would be more reliable. It ain't going to help to go back to using a leading '<html>' check though as the first item may only contain the prologue and not '<html>'.
> These checks are only going to work for iterables if the results of publishing of each item were added to the end of a list of strings, rather than being written back immediately using req.write(). Once all that has been returned by the iterable is obtained, this can all be joined back together and then the HTML check done.
> Joining all the separate items returned from the iterable back together defeats the purpose of what this feature was about in the first place and may result in huge in memory objects needing to be created to hold the combined result just so the HTML check can be done.
> The only way to avoid the problem is for the content type to be set explicitly by the user before the iterable is processed. This is a bit tricky as it is mod_python.publisher which is automagically doing this. The best you can do is something like:
> class SetContentType:
>   def __init__(self,content_type):
>     self.__content_type = content_type
>   def __call__(self,req):
>     req.content_type = self.__content_type
>     return ""
> index = [
>   SetContentType('text/html'),
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> Once you start doing this, the user may as well have provided their own published function in the first place that set the content type and manually iterated over items and wrote them to req.write(). This could also be managed by a user specified wrapper class which is how I saw this as preferably being done in the first place. Ie.,
> class PublishIterable:
>   def __init__(self,value,content_type):
>     self.__value = value
>     self.__content_type = content_type
>   def __call__(self,req):
>     req.content_type = self.__content_type
>     for item in self.__value:
>       req.write(item)
> _values = [
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> index = PublishIterable(_values,'text/html')
> Personally I believe this automagic publishing of iterables should be removed from mod_python.publisher. You might still provide a special function/wrapper that works like PublisherIterable but handles recursive structures and callable objects in the process, but I feel it should be up to the user to make a conscious effort to use it and mod_python.publisher shouldn't assume that it should process any iterable in this way automatically.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (MODPYTHON-97) mod_python.publisher iterables and content_type broken

Posted by "Graham Dumpleton (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/MODPYTHON-97?page=all ]
     
Graham Dumpleton closed MODPYTHON-97:
-------------------------------------


> mod_python.publisher iterables and content_type broken
> ------------------------------------------------------
>
>          Key: MODPYTHON-97
>          URL: http://issues.apache.org/jira/browse/MODPYTHON-97
>      Project: mod_python
>         Type: Bug
>   Components: publisher
>     Versions: 3.2.7
>     Reporter: Graham Dumpleton
>     Assignee: Nicolas Lehuen
>      Fix For: 3.2.7

>
> In 3.2, mod_python.publisher was modified so that if it encountered an interable it would recursively iterate over the items and publish each with the result being concatenated.
> FWIW, I personally didn't like this as I saw it potentially changing the behaviour of existing code, although perhaps in contrived cases or for test code only. I saw that this sort of behaviour should have been managed by the user by explicit use of a wrapper class instead, rather than it being magic. End of ramble. :-)
> Regardless of my concerns, the behaviour that was added is broken. Specifically, mod_python.publisher is setting the content type based on the content of the first item returned from the iterable. For example, consider the following:
> index = [
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> When published, this is resulting in the content type being set to 'text/plain' and not 'text/html'. In part this is perhaps caused by the fact that the content type check is now performed by looking for a trailing '</html>' in the content whereas previously it would look for a leading '<html>'. This was changed because all the HTML prologue that can appear before '<html>' would often throw out this check with the HTML not being automatically being detected. Thus at the time it was thought that looking for the trailing '</html>' would be more reliable. It ain't going to help to go back to using a leading '<html>' check though as the first item may only contain the prologue and not '<html>'.
> These checks are only going to work for iterables if the results of publishing of each item were added to the end of a list of strings, rather than being written back immediately using req.write(). Once all that has been returned by the iterable is obtained, this can all be joined back together and then the HTML check done.
> Joining all the separate items returned from the iterable back together defeats the purpose of what this feature was about in the first place and may result in huge in memory objects needing to be created to hold the combined result just so the HTML check can be done.
> The only way to avoid the problem is for the content type to be set explicitly by the user before the iterable is processed. This is a bit tricky as it is mod_python.publisher which is automagically doing this. The best you can do is something like:
> class SetContentType:
>   def __init__(self,content_type):
>     self.__content_type = content_type
>   def __call__(self,req):
>     req.content_type = self.__content_type
>     return ""
> index = [
>   SetContentType('text/html'),
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> Once you start doing this, the user may as well have provided their own published function in the first place that set the content type and manually iterated over items and wrote them to req.write(). This could also be managed by a user specified wrapper class which is how I saw this as preferably being done in the first place. Ie.,
> class PublishIterable:
>   def __init__(self,value,content_type):
>     self.__value = value
>     self.__content_type = content_type
>   def __call__(self,req):
>     req.content_type = self.__content_type
>     for item in self.__value:
>       req.write(item)
> _values = [
>   '<html><body><p>',
>   1000 * "X",
>   '</p></body></html>',
> ]
> index = PublishIterable(_values,'text/html')
> Personally I believe this automagic publishing of iterables should be removed from mod_python.publisher. You might still provide a special function/wrapper that works like PublisherIterable but handles recursive structures and callable objects in the process, but I feel it should be up to the user to make a conscious effort to use it and mod_python.publisher shouldn't assume that it should process any iterable in this way automatically.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira