You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mod_python-dev@quetz.apache.org by "Jim Gallacher (JIRA)" <ji...@apache.org> on 2005/11/26 22:36:55 UTC

[jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

    [ http://issues.apache.org/jira/browse/MODPYTHON-93?page=comments#action_12358568 ] 

Jim Gallacher commented on MODPYTHON-93:
----------------------------------------

The FieldStorage methods get, getitems, has_key, __len__, getfirst and getlist *all* iterate over the complete list of fields each time they are called and so would all benefit from some kind of indexing scheme. Here is a bit of code that will give you some idea of what I have in mind.

class FieldStorage:

    def __init__():
        self.index = {}  # holds references to the Field objects and can be used as an index.
        self.list = []

        ... blah blah blah ...
        ... create the field and get it's name ...

        self.list.append(field)
        if name in self.index:
            self.index[name].append(field)
        else:
            self.index[name] = [field,]

    def __getitem__(self, key):
        if key not in self.index:
            raise KeyError, key
       found = self.index[key]
       if len(found) == 1:
           return found[0]
       else:
           return found

The other FieldStorage methods would need to be refactored to take advantage of the index.


> Improve util.FieldStorage efficiency
> ------------------------------------
>
>          Key: MODPYTHON-93
>          URL: http://issues.apache.org/jira/browse/MODPYTHON-93
>      Project: mod_python
>         Type: Improvement
>   Components: core
>     Versions: 3.3
>     Reporter: Jim Gallacher
>     Assignee: Jim Gallacher
>     Priority: Minor

>
> Form fields are saved as a list in a FieldStorage class instance. The class implements a __getitem__ method to provide dict-like behaviour.  This method iterates over the complete list for every call to __getitem__. Applications that need to access all the fields when processing the form will show O(n^2) behaviour where n == the number of form fields. This overhead could be avoided by creating a dict (to use as an index) when the FieldStorage instance is created.
> Mike Looijmans has been investigating StringField and Field as well. It is probably reasonable to include information on his work in this issue as well, so that we can consider all of these efficiency issues in toto.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Gregory (Grisha) Trubetskoy wrote:
> 
> 
> Having looked at the FieldStorage code, I'm guessing the idea was that 
> you parse fields as they come in and append them to a list. This 
> preserves the original order of fields, in case it is needed.

I assumed that as well, but I'm not sure getting the fields in a 
particular order is the most common use case - not for me anyway. Plus, 
I'm not suggesting getting rid of access to FieldStorage.list. For the 
dict-like such as fs.keys(), I don't think people should expect keys in 
any particular order. If they really want that feature then they can 
subclass write their own ordered dict class.

> I'm not sure that maintaining a dictionary alongside the list is the 
> right thing to do. It might be, but there are some difficult questions 
> to answer -e.g. how costly is a sequential search, and is the code 
> complexity (and fieldstorage code is no picnic to read as it is) worth 
> the speedup?

The current code is a litte hairy - it helps to drink a cup of strong 
coffee before reading it. We can always hide the complexity in a 
separate add_field method. As to the performance tradeoffs, I guess we 
benchmark and see? I love doing benchmarks. The 2 things that bring joy 
to my heart are benchmarks and unit tests. And good documentation. The 
*3* things that bring joy to my heart are... well you get the idea.

> Also while it would speed up retrieval, it will slow down the "write" 
> operation - when a field is added to fieldstorage you now need to append 
> it to the list, AND check whether it exists in the dictionary, then add 
> it there as well.
> 
> How often do developers access form fields via __getitem__?  I noticed 
> the publisher does not use it - it iterates the list, so nothing would 
> be gained there.

For myself, I use it (almost?) exclusively as a dict. As for the use in 
publisher, maybe that implementation needs to be examined as well. ;)

> Also, something else to consider - is there a simple programatic 
> solution that could be documented, e.g. something like
> 
> my_fs = util.FieldStorage(req)
> 
> dict_fs = {}
> dict_fs.update(my_fs)
> 
> [have no idea whether this will work :-)]

Nope. If you have multiple fields with the same name you'll lose all but 
the last field. (eg. The checkbox example example on the mod_python list 
that got me started on this in the first place).

> and voila - you've got a dictionary based fieldstorage?

Except that FieldStorage is already supposed to be dict-like so why 
would I want to duplicate the effort in my code? For example 7 out of 10 
   the fs methods are there to support dict-like behaviour and the other 
3 are initialization helpers which will never be called by user code 
anyway. To me, that makes it a dictionary. I'm not talking about adding 
new functionality to FieldStorage, just examining the current 
implementation wrt to performance.

> Anyway, just a few cents from me.

I don't want you to think I'm hung up on this issue. It just seems to me 
that the goal of mod_python moving forward should be stability, speed, 
efficiency while keeping feature creep to a minumum. I think it's 
worthwhile to examine existing code as we go along to see if we are 
meeting these goals.  We still need to have *some* code to chew on every 
now and then after all. :)

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Mike Looijmans <nl...@natlab.research.philips.com>.

>> I am still not convinced we need an index. I'd like to see some 
>> concrete proof that we're not engaging in "overoptimization" here - is 
>> this really a bottleneck for anyone?
> 
> Well, I need to get some real work done today, but I will do some 
> benchmarking at some point. It's not like there's a big rush to complete 
> 3.3.0 this week. I do like that Mike's code simplifies a number of the 
> the FS methods. It just looks more elegant.

"More elegant" is the best reason I can think of for refactoring it. 
Heck, even if my implementation can be proved to be slower than the 
original, I still like this one better because of sheer code size and 
readability.

I don't think that you'll notice much of a performance difference, but I 
like the fact that it:
- You get the same object instance:
assert (form['key'] is form['key'])
- Does not create a file object for every field
- Getting form['key'] a second time is cheap.

-- 
Mike Looijmans
Philips Natlab / Topic Automation

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Gregory (Grisha) Trubetskoy wrote:
> 
> On Tue, 29 Nov 2005, Jim Gallacher wrote:
> 
>> I still think the correct place to create the index dictionary is in 
>> the __init__ phase. Using the dictionary-on-demand idea only improves 
>> performance on the second access to a form field. For the first access 
>> you are still iterating through the whole list for each field name.
> 
> 
> I am still not convinced we need an index. I'd like to see some concrete 
> proof that we're not engaging in "overoptimization" here - is this 
> really a bottleneck for anyone?

Well, I need to get some real work done today, but I will do some 
benchmarking at some point. It's not like there's a big rush to complete 
3.3.0 this week. I do like that Mike's code simplifies a number of the 
the FS methods. It just looks more elegant.

> If we're concerned (and I'm not at this point) that FieldStorage is too 
> slow, we should just rewrite the whole thing in C :-)

A worthy goal. Perhaps we should re-write all of mod_python in C.... Any 
takers? No? Maybe not then. :)

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.

On Tue, 29 Nov 2005, Jim Gallacher wrote:

> I still think the correct place to create the index dictionary is in the 
> __init__ phase. Using the dictionary-on-demand idea only improves performance 
> on the second access to a form field. For the first access you are still 
> iterating through the whole list for each field name.

I am still not convinced we need an index. I'd like to see some concrete 
proof that we're not engaging in "overoptimization" here - is this really 
a bottleneck for anyone?

If we're concerned (and I'm not at this point) that FieldStorage is too 
slow, we should just rewrite the whole thing in C :-)

Grisha

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Nick wrote:
> Jim Gallacher wrote:
> 
>> For starters, most of the methods such as keys, has_key and so on will 
>> raise an exception since you don't create the dictionary until after 
>> one of the fields is accessed via __getitem__. Also, has_key will 
>> return false for a field that actually exists if that field has not 
>> yet been accessed.
> 
> 
> Since __getitem__ refers to self.dictionary, it would get created at 
> that point.  I don't think that's an issue.

Oops. Brain fart. When I read the code I s/__getattr__/__getitem__/g.

>> def hander(req):
>>     form = FieldStorage(req)
>>     # this will raise an exception
>>     keys = form.keys()
>>
>> I still think the correct place to create the index dictionary is in 
>> the __init__ phase. Using the dictionary-on-demand idea only improves 
>> performance on the second access to a form field. For the first access 
>> you are still iterating through the whole list for each field name.
> 
> 
> Isn't that what "on-demand" means?  If you never use the dictionary, 
> you'll never have to wait for it to be created.  This is at worst no 
> worse than what is there now (actually it's better the way he's coded it).

You are correct, there is no problem. It works as advertised. Operator 
error... whoot, whoot, whoot. :)

>> I personally think any performance degradation resulting from creating 
>> the dictionary in __init__ will be small compared to the overall 
>> improvement we'll get for accessing the fields. I would advocate 
>> creating the dictionary in the add_field() method.
> 
> 
> Depends on the size of the form, but I can see a performance hit for 
> people (like me) who never use FieldStorage as a dictionary.

Just curious. How do you use it?

>> Since we are messing with these classes anyway, is there any reason we 
>> are not using new-style classes, which is to say subclassing object? 
> 
> 
> Isn't that the default in 2.3+, which is what mod_python now requires?

Exactly. I guess my question would be "is there any reason to *not* 
refactor our code"?

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nick <ni...@dd.revealed.net>.

Jim Gallacher wrote:
> For starters, most of the methods such as keys, has_key and so on will 
> raise an exception since you don't create the dictionary until after one 
> of the fields is accessed via __getitem__. Also, has_key will return 
> false for a field that actually exists if that field has not yet been 
> accessed.

Since __getitem__ refers to self.dictionary, it would get created at that 
point.  I don't think that's an issue.

> def hander(req):
>     form = FieldStorage(req)
>     # this will raise an exception
>     keys = form.keys()
> 
> I still think the correct place to create the index dictionary is in the 
> __init__ phase. Using the dictionary-on-demand idea only improves 
> performance on the second access to a form field. For the first access 
> you are still iterating through the whole list for each field name.

Isn't that what "on-demand" means?  If you never use the dictionary, you'll 
never have to wait for it to be created.  This is at worst no worse than 
what is there now (actually it's better the way he's coded it).

> I personally think any performance degradation resulting from creating 
> the dictionary in __init__ will be small compared to the overall 
> improvement we'll get for accessing the fields. I would advocate 
> creating the dictionary in the add_field() method.

Depends on the size of the form, but I can see a performance hit for people 
(like me) who never use FieldStorage as a dictionary.

> Since we are messing with these classes anyway, is there any reason we 
> are not using new-style classes, which is to say subclassing object? 

Isn't that the default in 2.3+, which is what mod_python now requires?

Nick

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Mike Looijmans wrote:
>> How about we make the first call to get or __getitem__ create the 
>> dictionary? We could put code in __getattr__ to create it when it's 
>> referenced.

For starters, most of the methods such as keys, has_key and so on will 
raise an exception since you don't create the dictionary until after one 
of the fields is accessed via __getitem__. Also, has_key will return 
false for a field that actually exists if that field has not yet been 
accessed.

def hander(req):
     form = FieldStorage(req)
     # this will raise an exception
     keys = form.keys()

I still think the correct place to create the index dictionary is in the 
__init__ phase. Using the dictionary-on-demand idea only improves 
performance on the second access to a form field. For the first access 
you are still iterating through the whole list for each field name.

I personally think any performance degradation resulting from creating 
the dictionary in __init__ will be small compared to the overall 
improvement we'll get for accessing the fields. I would advocate 
creating the dictionary in the add_field() method.

Also, I think you should consistently use add_field in 
FieldStorage.__init__. In the last line of that method you are still 
using self.list.append. Seems to me that is a bug waiting to happen. 
Some future maintainer modifies add_field, missing the list.append and 
then can't figure out why things are not quite working.

Since we are messing with these classes anyway, is there any reason we 
are not using new-style classes, which is to say subclassing object? 
Perhaps we should be refactoring our code as we work on mod_python 3.3?

Regards,
Jim

> Here is the patch to util.py to use a dictionary-on-demand.
> Lots of code removed for this one, and a few lines added.
> 
> There's also a brand new items() function, which, contrary to common 
> belief, preserves argument ordering.
> 
> Note: This also includes my previous StringField patch, since that is 
> crucial to make the implementation as simple as it is now.
> 
> Anxiously awaiting your comments and benchmark/profiler results!
> 
> ['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks']

Benchmarks will have to wait. I can supply the others any time you want. ;)

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Nick wrote:
> Jim Gallacher wrote:
> 
>> I've only started using properites and I hadn't thought of that. If we 
>> were to use such a feature is that the sort detail that would be 
>> included in the user documentation, or would the coder just be 
>> expected to read the source?
> 
> 
> You CAN read the source if you need implementation details, especially 
> if you're going to override the default behavior.  However, in most 
> cases I think you're going to extend behavior, so you will eventually 
> call the superclass' getter/setter.  A real problem with Python is that 
> there's no standard convention for writing property getters and 
> setters.  I myself use an idiom like this:
> 
> def variable():
>     def fget(self): return None
>     def fset(self, value): pass
>     def fdel(self): pass
>     doc = """The documentation"""
>     return locals()
> variable = property(**variable())
> 
> I like this idiom, because it's really easy to spot in the code, and it 
> keeps everything together for the property.  This forces you to 
> completely override the property in a subclass, but you don't have to 
> know the implementation details of the superclass to do that.

Cool.

> Another way might be (minimally and traditionally):
> 
> def variable_getter(self): return None
> def variable_setter(self): pass
> variable = property(variable_getter, variable_setter)
> 
> which then allows you to override variable_getter or variable_setter in 
> your subclass without overriding the property itself, but now you've got 
> to read the code.  Or else completely override the property, but you've 
> got a choice.


My understanding (confired via experimentation) is that with the 
'traditional' way you can't orerride the getter or setter originally 
bound in by property call. The original methods will be used, not the 
ones in the subclass.

class Stuff(object):
     def getA(self):
         return 'stuff'

     a = property(getA)

class MoreStuff(object):
     def getA(self):
         return 'MORE stuff'


 >>> thing = MoreStuff()
 >>> print thing.a
stuff

But at any rate, these are basic python questions and I don't think 
python-dev is the correct forum for this discussion.

Thanks for giving me a couple of ideas to try out Nick.

Regards,
Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nick <ni...@dd.revealed.net>.

Jim Gallacher wrote:
> I've only started using properites and I hadn't thought of that. If we 
> were to use such a feature is that the sort detail that would be 
> included in the user documentation, or would the coder just be expected 
> to read the source?

You CAN read the source if you need implementation details, especially if 
you're going to override the default behavior.  However, in most cases I 
think you're going to extend behavior, so you will eventually call the 
superclass' getter/setter.  A real problem with Python is that there's no 
standard convention for writing property getters and setters.  I myself use 
an idiom like this:

def variable():
     def fget(self): return None
     def fset(self, value): pass
     def fdel(self): pass
     doc = """The documentation"""
     return locals()
variable = property(**variable())

I like this idiom, because it's really easy to spot in the code, and it 
keeps everything together for the property.  This forces you to completely 
override the property in a subclass, but you don't have to know the 
implementation details of the superclass to do that.

Another way might be (minimally and traditionally):

def variable_getter(self): return None
def variable_setter(self): pass
variable = property(variable_getter, variable_setter)

which then allows you to override variable_getter or variable_setter in your 
subclass without overriding the property itself, but now you've got to read 
the code.  Or else completely override the property, but you've got a choice.

Nick

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Mike Looijmans <nl...@natlab.research.philips.com>.

>>>> Just one comment.  It seems like it would be better just to make 
>>>> add_method inline, since everything else in __init__ is, and it 
>>>> never gets called from anywhere else.

s/_method/_field/g

The thing I had in mind when I built the add_field method was that I 
could subclass FieldStorage and give the page handler more control over 
what is happening. This was convenient for the thing that got this all 
started: Posting large files. In fact, I intended add_field to replace 
the make_file function.

What I had in mind is that you override the add_field function to get 
context information (e.g. user name, upload directory and such) before 
the file upload comes in, so that you can put the upload where it belongs.

With the two callbacks, you can create the same effect, so it was no 
longer nessecary.

Also, the add_field is referred at least twice and I don't like 
duplicating code.

-- 
Mike Looijmans
Philips Natlab / Topic Automation

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Nick wrote:
> Jim Gallacher wrote:
> 
>> Nick wrote:
>>
>>> Just one comment.  It seems like it would be better just to make 
>>> add_method inline, since everything else in __init__ is, and it never 
>>> gets called from anywhere else.
>>
>>
>> add_method?
> 
> 
> Haha, thanks, I haven't had coffee yet.  The add_item method, that is. :)

By which I assume you mean add_field? Other than that you are perfectly 
all right. :)

>> I also like properties, but doesn't that cause a problem if someone 
>> chooses to subclass FieldStorage?
> 
> 
> It could if you didn't realize it was a property.  But you can always 
> override a property with another property.

I've only started using properites and I hadn't thought of that. If we 
were to use such a feature is that the sort detail that would be 
included in the user documentation, or would the coder just be expected 
to read the source?

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nick <ni...@dd.revealed.net>.

Jim Gallacher wrote:
> Nick wrote:
> 
>> Just one comment.  It seems like it would be better just to make 
>> add_method inline, since everything else in __init__ is, and it never 
>> gets called from anywhere else.
> 
> add_method?

Haha, thanks, I haven't had coffee yet.  The add_item method, that is. :)

> I also like properties, but doesn't that cause a problem if someone 
> chooses to subclass FieldStorage?

It could if you didn't realize it was a property.  But you can always 
override a property with another property.

Nick

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Nick wrote:
> Just one comment.  It seems like it would be better just to make 
> add_method inline, since everything else in __init__ is, and it never 
> gets called from anywhere else.

add_method?

> But it looks good and probably fairly optimal.  I personally like to see 
> properties instead of __getattr__ for known attributes, but that's a 
> style preference.

I also like properties, but doesn't that cause a problem if someone 
chooses to subclass FieldStorage?

Jim


> Nick
> 
> Mike Looijmans wrote:
> 
>>> How about we make the first call to get or __getitem__ create the 
>>> dictionary? We could put code in __getattr__ to create it when it's 
>>> referenced.
>>
>>
>>
>> Here is the patch to util.py to use a dictionary-on-demand.
>> Lots of code removed for this one, and a few lines added.
>>
>> There's also a brand new items() function, which, contrary to common 
>> belief, preserves argument ordering.
>>
>> Note: This also includes my previous StringField patch, since that is 
>> crucial to make the implementation as simple as it is now.
>>
>> Anxiously awaiting your comments and benchmark/profiler results!
>>
>> ['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks']

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nick <ni...@dd.revealed.net>.

Just one comment.  It seems like it would be better just to make add_method 
inline, since everything else in __init__ is, and it never gets called from 
anywhere else.

But it looks good and probably fairly optimal.  I personally like to see 
properties instead of __getattr__ for known attributes, but that's a style 
preference.

Nick

Mike Looijmans wrote:
>> How about we make the first call to get or __getitem__ create the 
>> dictionary? We could put code in __getattr__ to create it when it's 
>> referenced.
> 
> 
> Here is the patch to util.py to use a dictionary-on-demand.
> Lots of code removed for this one, and a few lines added.
> 
> There's also a brand new items() function, which, contrary to common 
> belief, preserves argument ordering.
> 
> Note: This also includes my previous StringField patch, since that is 
> crucial to make the implementation as simple as it is now.
> 
> Anxiously awaiting your comments and benchmark/profiler results!
> 
> ['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks']
> 
> 
> ------------------------------------------------------------------------
> 
> Index: util.py
> ===================================================================
> --- util.py	(revision 348746)
> +++ util.py	(working copy)
> @@ -48,19 +48,8 @@
>  """
>  
>  class Field:
> -   def __init__(self, name, file, ctype, type_options,
> -                disp, disp_options, headers = {}):
> +   def __init__(self, name):
>         self.name = name
> -       self.file = file
> -       self.type = ctype
> -       self.type_options = type_options
> -       self.disposition = disp
> -       self.disposition_options = disp_options
> -       if disp_options.has_key("filename"):
> -           self.filename = disp_options["filename"]
> -       else:
> -           self.filename = None
> -       self.headers = headers
>  
>     def __repr__(self):
>         """Return printable representation."""
> @@ -81,13 +70,34 @@
>         self.file.close()
>  
>  class StringField(str):
> -   """ This class is basically a string with
> -   a value attribute for compatibility with std lib cgi.py
> -   """
> +    """ This class is basically a string with
> +    added attributes for compatibility with std lib cgi.py. Basically, this
> +    works the opposite of Field, as it stores its data in a string, but creates
> +    a file on demand. Field creates a value on demand and stores data in a file.
> +    """
> +    filename = None
> +    headers = {}
> +    ctype = "text/plain"
> +    type_options = {}
> +    disposition = None
> +    disp_options = None
> +    
> +    # I wanted __init__(name, value) but that does not work (apparently, you
> +    # cannot subclass str with a constructor that takes >1 argument)
> +    def __init__(self, value):
> +        '''Create StringField instance. You'll have to set name yourself.'''
> +        str.__init__(self, value)
> +        self.value = value
>  
> -   def __init__(self, str=""):
> -       str.__init__(self, str)
> -       self.value = self.__str__()
> +    def __getattr__(self, name):
> +        if name != 'file':
> +            raise AttributeError, name
> +        self.file = cStringIO.StringIO(self.value)
> +        return self.file
> +        
> +    def __repr__(self):
> +       """Return printable representation (to pass unit tests)."""
> +       return "Field(%s, %s)" % (`self.name`, `self.value`)
>  
>  class FieldStorage:
>  
> @@ -103,8 +113,7 @@
>         if req.args:
>             pairs = parse_qsl(req.args, keep_blank_values)
>             for pair in pairs:
> -               file = cStringIO.StringIO(pair[1])
> -               self.list.append(Field(pair[0], file, "text/plain", {}, None, {}))
> +               self.add_field(pair[0], pair[1])
>  
>         if req.method != "POST":
>             return
> @@ -123,9 +132,7 @@
>         if ctype.startswith("application/x-www-form-urlencoded"):
>             pairs = parse_qsl(req.read(clen), keep_blank_values)
>             for pair in pairs:
> -               # TODO : isn't this a bit heavyweight just for form fields ?
> -               file = cStringIO.StringIO(pair[1])
> -               self.list.append(Field(pair[0], file, "text/plain", {}, None, {}))
> +               self.add_field(pair[0], pair[1])
>             return
>  
>         if not ctype.startswith("multipart/"):
> @@ -205,33 +212,44 @@
>             else:
>                 name = None
>  
> +           # create a file object
>             # is this a file?
>             if disp_options.has_key("filename"):
>                 if file_callback and callable(file_callback):
>                     file = file_callback(disp_options["filename"])
>                 else:
> -                   file = self.make_file()
> +                   file = tempfile.TemporaryFile("w+b")
>             else:
>                 if field_callback and callable(field_callback):
>                     file = field_callback()
>                 else:
> -                   file = self.make_field()
> +                   file = cStringIO.StringIO()
>  
>             # read it in
> -           end_of_stream = self.read_to_boundary(req, boundary, file)
> +           self.read_to_boundary(req, boundary, file)
>             file.seek(0)
> -
> + 
>             # make a Field
> -           field = Field(name, file, ctype, type_options, disp, disp_options, headers)
> -
> +           if disp_options.has_key("filename"):
> +               field = Field(name)
> +               field.filename = disp_options["filename"]
> +           else:
> +               field = StringField(file.read())
> +               field.name = name
> +           field.file = file
> +           field.type = ctype
> +           field.type_options = type_options
> +           field.disposition = disp
> +           field.disposition_options = disp_options
> +           field.headers = headers
>             self.list.append(field)
>  
> -   def make_file(self):
> -       return tempfile.TemporaryFile("w+b")
> +   def add_field(self, key, value):
> +       """Insert a field as key/value pair"""
> +       item = StringField(value)
> +       item.name = key
> +       self.list.append(item)
>  
> -   def make_field(self):
> -       return cStringIO.StringIO()
> -
>     def read_to_boundary(self, req, boundary, file):
>          previous_delimiter = None
>          while True:
> @@ -283,18 +301,7 @@
>  
>     def __getitem__(self, key):
>         """Dictionary style indexing."""
> -       if self.list is None:
> -           raise TypeError, "not indexable"
> -       found = []
> -       for item in self.list:
> -           if item.name == key:
> -               if isinstance(item.file, FileType) or \
> -                      isinstance(getattr(item.file, 'file', None), FileType):
> -                   found.append(item)
> -               else:
> -                   found.append(StringField(item.value))
> -       if not found:
> -           raise KeyError, key
> +       found = self.dictionary[key]
>         if len(found) == 1:
>             return found[0]
>         else:
> @@ -303,56 +310,56 @@
>     def get(self, key, default):
>         try:
>             return self.__getitem__(key)
> -       except KeyError:
> +       except (TypeError, KeyError):
>             return default
>  
>     def keys(self):
>         """Dictionary style keys() method."""
> -       if self.list is None:
> -           raise TypeError, "not indexable"
> -       keys = []
> -       for item in self.list:
> -           if item.name not in keys: keys.append(item.name)
> -       return keys
> +       return self.dictionary.keys()
>  
>     def has_key(self, key):
>         """Dictionary style has_key() method."""
> -       if self.list is None:
> -           raise TypeError, "not indexable"
> -       for item in self.list:
> -           if item.name == key: return 1
> -       return 0
> +       return (key in self.dictionary)
>  
>     __contains__ = has_key
>  
>     def __len__(self):
>         """Dictionary style len(x) support."""
> -       return len(self.keys())
> +       return len(self.dictionary.keys())
>  
>     def getfirst(self, key, default=None):
>         """ return the first value received """
> -       for item in self.list:
> -           if item.name == key:
> -               if isinstance(item.file, FileType) or \
> -                      isinstance(getattr(item.file, 'file', None), FileType):
> -                   return item
> -               else:
> -                   return StringField(item.value)
> -       return default
> +       try:
> +           return self.dictionary[key][0]
> +       except KeyError:
> +           return default
>  
>     def getlist(self, key):
>         """ return a list of received values """
> -       if self.list is None:
> +       try:
> +           return self.dictionary[key]
> +       except KeyError:
> +           return []
> +           
> +   def items(self):
> +       """Dictionary-style items(), except that items are returned in the same
> +       order as they were supplied in the form."""
> +       return [(item.name, item) for item in self.list]
> +       
> +   def __getattr__(self, name):
> +       if name != 'dictionary':
> +          raise AttributeError, name
> +       list = self.list
> +       if list is None:
>             raise TypeError, "not indexable"
> -       found = []
> -       for item in self.list:
> -           if item.name == key:
> -               if isinstance(item.file, FileType) or \
> -                      isinstance(getattr(item.file, 'file', None), FileType):
> -                   found.append(item)
> -               else:
> -                   found.append(StringField(item.value))
> -       return found
> +       result = {}
> +       self.dictionary = result
> +       for item in list:
> +           if item.name in result:
> +              result[item.name].append(item)
> +           else:
> +              result[item.name] = [item]
> +       return result
>  
>  def parse_header(line):
>     """Parse a Content-type like header.

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Mike Looijmans <nl...@natlab.research.philips.com>.

> How about we make the first call to get or __getitem__ create the 
> dictionary? We could put code in __getattr__ to create it when it's 
> referenced.

Here is the patch to util.py to use a dictionary-on-demand.
Lots of code removed for this one, and a few lines added.

There's also a brand new items() function, which, contrary to common 
belief, preserves argument ordering.

Note: This also includes my previous StringField patch, since that is 
crucial to make the implementation as simple as it is now.

Anxiously awaiting your comments and benchmark/profiler results!

['Lies', 'Damn lies', 'Statistics', 'Delivery dates', 'Benchmarks']
-- 
Mike Looijmans
Philips Natlab / Topic Automation

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Mike Looijmans <nl...@natlab.research.philips.com>.

Nicolas Lehuen wrote:
> One proposal was to use an OrderedHashtable. As you can see, the reply
> was quite definite :
> 
> "No. There is nothing in the spec that says that these parameters should be
>  in any sort of order. CGI scripts that expect them to be in order are coded
>  incorrectly IMHO."

Standing on a soap box preaching like that may work in the Java world, 
but we Python programmers have a more realistic view of the world. If we 
don't want to be crushed by a stampede of angry web developers, I guess 
we should, at least, provide the means to:
- Iterate through the arguments "as listed".
- Get multi-fields into an array in the order as listed.
- Provide fast, dictionary-like access.

Hey! this is Python. We can do all that, without losing anything. Let 
them poor Java folks preach about hell and doom, while we just get on 
with our lives and build something that meets everybody's expectations.
How about we make the first call to get or __getitem__ create the 
dictionary? We could put code in __getattr__ to create it when it's 
referenced.

Patch is on its way...

-- 
Mike Looijmans
Philips Natlab / Topic Automation

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nicolas Lehuen <ni...@gmail.com>.

2005/11/29, Nicolas Lehuen <ni...@gmail.com>:
> 2005/11/29, Mike Looijmans <nl...@natlab.research.philips.com>:
> > Nicolas Lehuen wrote:
> > > Why is the ordering so important ? I do understand we need to support
> > > multiple values per field name, but I don't see why ordering is
> > > needed.
> >
> > Because there are applications out there that will break if you change
> > it. Besides that, think about a form with a few text entry boxes (all
> > the same name, e.g. "spouse"). It would be very confusing for a user of
> > that page to see the text re-ordered every time he clicks one of the
> > buttons on the page. (I'm perfectly aware of at least 4 alternatives,
> > but that is not my point here).
> >
> >  From the page code I've written and seen so far, the order of
> > differently named fields is not important. I haven't seen a case where
> > a form expecting a=1&b=2 would fail if you pass it b=2&a=1. But I have
> > seen cases where a=1&x=2&x=3 is not the same as a=1&x=3&x=2. The simple
> > dictionary implementation as proposed would not break that code.
> >
> >
> > --
> > Mike Looijmans
> > Philips Natlab / Topic Automation
> >
> >
>
> Hi Mike,
>
> As Jim pointed out, even if using a simple dict structure would not
> enable us to preserve the *key* ordering, it would still allow us to
> preserve the *value* ordering for fields with the same key, since they
> would be added to lists in the same order the browser would send them.
>
> So my guess is that preserving key order is not required, but
> preserving value order for a given key is. In that case a simple dict
> with list values is sufficient, easy to implement and efficient. Your
> examples are easily handled with this solution.
>
> I think we should check how this problem has been solved in other
> programming environments. I'll check how this was done in the Java
> servlet API.

Well, the Java Servlets 2.4 API doesn't say anything about field order
(see  in javax.servlet.ServletRequest.getParameterName() or page 35 in
servlet-2_4-fr-spec.pdf).

It turns out this problem was raised in the antique JServ project :
http://archive.apache.org/gnats/5211

One proposal was to use an OrderedHashtable. As you can see, the reply
was quite definite :

"No. There is nothing in the spec that says that these parameters should be
 in any sort of order. CGI scripts that expect them to be in order are coded
 incorrectly IMHO."

Regards,
Nicolas

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nicolas Lehuen <ni...@gmail.com>.

2005/11/29, Mike Looijmans <nl...@natlab.research.philips.com>:
> Nicolas Lehuen wrote:
> > Why is the ordering so important ? I do understand we need to support
> > multiple values per field name, but I don't see why ordering is
> > needed.
>
> Because there are applications out there that will break if you change
> it. Besides that, think about a form with a few text entry boxes (all
> the same name, e.g. "spouse"). It would be very confusing for a user of
> that page to see the text re-ordered every time he clicks one of the
> buttons on the page. (I'm perfectly aware of at least 4 alternatives,
> but that is not my point here).
>
>  From the page code I've written and seen so far, the order of
> differently named fields is not important. I haven't seen a case where
> a form expecting a=1&b=2 would fail if you pass it b=2&a=1. But I have
> seen cases where a=1&x=2&x=3 is not the same as a=1&x=3&x=2. The simple
> dictionary implementation as proposed would not break that code.
>
>
> --
> Mike Looijmans
> Philips Natlab / Topic Automation
>
>

Hi Mike,

As Jim pointed out, even if using a simple dict structure would not
enable us to preserve the *key* ordering, it would still allow us to
preserve the *value* ordering for fields with the same key, since they
would be added to lists in the same order the browser would send them.

So my guess is that preserving key order is not required, but
preserving value order for a given key is. In that case a simple dict
with list values is sufficient, easy to implement and efficient. Your
examples are easily handled with this solution.

I think we should check how this problem has been solved in other
programming environments. I'll check how this was done in the Java
servlet API.

Regards,
Nicolas

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Mike Looijmans <nl...@natlab.research.philips.com>.

Nicolas Lehuen wrote:
> Why is the ordering so important ? I do understand we need to support
> multiple values per field name, but I don't see why ordering is
> needed.

Because there are applications out there that will break if you change 
it. Besides that, think about a form with a few text entry boxes (all 
the same name, e.g. "spouse"). It would be very confusing for a user of 
that page to see the text re-ordered every time he clicks one of the 
buttons on the page. (I'm perfectly aware of at least 4 alternatives, 
but that is not my point here).

 From the page code I've written and seen so far, the order of 
differently named fields is not important. I haven't seen a case where
a form expecting a=1&b=2 would fail if you pass it b=2&a=1. But I have 
seen cases where a=1&x=2&x=3 is not the same as a=1&x=3&x=2. The simple 
dictionary implementation as proposed would not break that code.

-- 
Mike Looijmans
Philips Natlab / Topic Automation

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Nicolas Lehuen wrote:
> According to the W3C, "The control names/values are listed in the
> order they appear in the document."
> 
> http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4.1

Which also says to "Please consult [RFC2388] for additional 
information." without actually saying that it contradicts or supercedes 
2388. It's usually at this point I want to run off an set my hair on fire.

> But that doesn't say anything about the server side...

And both specs are strangely silent on the whole issue of building a 
hash table from the form data and providing a list of the names used as 
keys in creating that table. ;)

This part of the discussion really only affects the FieldStorage.keys() 
method. The mod_python documentation states "The FieldStorage class has 
a mapping object interface i.e. it can be treated like a dictionary", 
with no mention of any special ordering behaviour for the keys. It seems 
to me users would be foolish to expect anything other the standard 
python dict behaviour. We should be free to return any order we want 
from keys().

> Relying on the field order is a mark of sloppy programming IMHO.But
> maybe there are corner cases when handling a series of checkboxes with
> the same name is easier if the values are given in the order they were
> found in the form.

Ahh... but the values *would* be returned in the correct order for a 
given key, since internally the index dictionary values are stored as a 
list of fields. Consider this possible code snippet for use in __init__().

         self.list.append(field)
         if key in self.index:
             self.index[key].append(field)
         else:
             self.index[key] = [field,]

Order is preserved for a particular field name, so we are still 
sloppy-coder friendly. :)

Jim

> 
> Regards,
> Nicolas
> 
> 2005/11/28, Jim Gallacher <jp...@jgassociates.ca>:
> 
>>Gregory (Grisha) Trubetskoy wrote:
>>
>>>On Mon, 28 Nov 2005, Nicolas Lehuen wrote:
>>>
>>>
>>>>Why is the ordering so important ? I do understand we need to support
>>>>multiple values per field name, but I don't see why ordering is
>>>>needed.
>>>
>>>
>>>I think that it may be dictated by some RFC (the stdlib does it this way
>>>too), I'm not sure, but it's a good question though, it'd be great to
>>>have it researched and answered so that we don't have to go over this
>>>point again.
>>>
>>>Grisha
>>>
>>>
>>
>>Ordering is not defined according to my interpretation. But at the same
>>time we shouldn't mess with the ordering. Gotta love those RFCs. :)
>>
>>http://www.ietf.org/rfc/rfc2388.txt?number=2388
>>
>>Returning Values from Forms:  multipart/form-data
>>
>>5.5 Ordered fields and duplicated field names
>>
>>    The relationship of the ordering of fields within a form and the
>>    ordering of returned values within "multipart/form-data" is not
>>    defined by this specification, nor is the handling of the case where
>>    a form has multiple fields with the same name. While HTML-based forms
>>    may send back results in the order received, and intermediaries
>>    should not reorder the results, there are some systems which might
>>    not define a natural order for form fields.
>>
>>Jim
>>
> 
>

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nicolas Lehuen <ni...@gmail.com>.

According to the W3C, "The control names/values are listed in the
order they appear in the document."

http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4.1

But that doesn't say anything about the server side...

Relying on the field order is a mark of sloppy programming IMHO.But
maybe there are corner cases when handling a series of checkboxes with
the same name is easier if the values are given in the order they were
found in the form.

Regards,
Nicolas

2005/11/28, Jim Gallacher <jp...@jgassociates.ca>:
> Gregory (Grisha) Trubetskoy wrote:
> >
> > On Mon, 28 Nov 2005, Nicolas Lehuen wrote:
> >
> >> Why is the ordering so important ? I do understand we need to support
> >> multiple values per field name, but I don't see why ordering is
> >> needed.
> >
> >
> > I think that it may be dictated by some RFC (the stdlib does it this way
> > too), I'm not sure, but it's a good question though, it'd be great to
> > have it researched and answered so that we don't have to go over this
> > point again.
> >
> > Grisha
> >
> >
>
> Ordering is not defined according to my interpretation. But at the same
> time we shouldn't mess with the ordering. Gotta love those RFCs. :)
>
> http://www.ietf.org/rfc/rfc2388.txt?number=2388
>
> Returning Values from Forms:  multipart/form-data
>
> 5.5 Ordered fields and duplicated field names
>
>     The relationship of the ordering of fields within a form and the
>     ordering of returned values within "multipart/form-data" is not
>     defined by this specification, nor is the handling of the case where
>     a form has multiple fields with the same name. While HTML-based forms
>     may send back results in the order received, and intermediaries
>     should not reorder the results, there are some systems which might
>     not define a natural order for form fields.
>
> Jim
>

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Jim Gallacher <jp...@jgassociates.ca>.

Gregory (Grisha) Trubetskoy wrote:
> 
> On Mon, 28 Nov 2005, Nicolas Lehuen wrote:
> 
>> Why is the ordering so important ? I do understand we need to support
>> multiple values per field name, but I don't see why ordering is
>> needed.
> 
> 
> I think that it may be dictated by some RFC (the stdlib does it this way 
> too), I'm not sure, but it's a good question though, it'd be great to 
> have it researched and answered so that we don't have to go over this 
> point again.
> 
> Grisha
> 
> 

Ordering is not defined according to my interpretation. But at the same 
time we shouldn't mess with the ordering. Gotta love those RFCs. :)

http://www.ietf.org/rfc/rfc2388.txt?number=2388

Returning Values from Forms:  multipart/form-data

5.5 Ordered fields and duplicated field names

    The relationship of the ordering of fields within a form and the
    ordering of returned values within "multipart/form-data" is not
    defined by this specification, nor is the handling of the case where
    a form has multiple fields with the same name. While HTML-based forms
    may send back results in the order received, and intermediaries
    should not reorder the results, there are some systems which might
    not define a natural order for form fields.

Jim

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.

On Mon, 28 Nov 2005, Nicolas Lehuen wrote:

> Why is the ordering so important ? I do understand we need to support
> multiple values per field name, but I don't see why ordering is
> needed.

I think that it may be dictated by some RFC (the stdlib does it this way 
too), I'm not sure, but it's a good question though, it'd be great to have 
it researched and answered so that we don't have to go over this point 
again.

Grisha

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nick <ni...@dd.revealed.net>.

If you provide say FieldStorage.make_dict that returns a dictionary, then I 
don't see why the order of the keys is important when the original list is 
still available.

Nick

Nicolas Lehuen wrote:
> Hi,
> 
> Speaking of ordered dictionary :
> 
> http://www.voidspace.org.uk/python/weblog/arch_d7_2005_11_19.shtml#e140
> 
> Why is the ordering so important ? I do understand we need to support
> multiple values per field name, but I don't see why ordering is
> needed.
> 
> Regards,
> Nicolas
> 
> 2005/11/28, David Fraser <da...@sjsoft.com>:
> 
>>Gregory (Grisha) Trubetskoy wrote:
>>
>>
>>>
>>>Having looked at the FieldStorage code, I'm guessing the idea was that
>>>you parse fields as they come in and append them to a list. This
>>>preserves the original order of fields, in case it is needed.
>>>
>>>I'm not sure that maintaining a dictionary alongside the list is the
>>>right thing to do. It might be, but there are some difficult questions
>>>to answer -e.g. how costly is a sequential search, and is the code
>>>complexity (and fieldstorage code is no picnic to read as it is) worth
>>>the speedup?
>>>
>>>Also while it would speed up retrieval, it will slow down the "write"
>>>operation - when a field is added to fieldstorage you now need to
>>>append it to the list, AND check whether it exists in the dictionary,
>>>then add it there as well.
>>>
>>>How often do developers access form fields via __getitem__?  I noticed
>>>the publisher does not use it - it iterates the list, so nothing would
>>>be gained there.
>>
>>We do it a lot but we copy it into a different dictionary first to get
>>exactly the setup we want. But dictionary-style access is a very
>>obvious, pythonic way to do it.
>>I have a simple 70-line ordereddict implementation which is derived from
>>dict and remembers the keys in the order that they were assigned when
>>iterating through the list, this may be a way to go for this. It just
>>uses a list of keys internally to remember the order, and otherwise is a
>>dictionary...
>>
>>
>>>Also, something else to consider - is there a simple programatic
>>>solution that could be documented, e.g. something like
>>>
>>>my_fs = util.FieldStorage(req)
>>>
>>>dict_fs = {}
>>>dict_fs.update(my_fs)
>>>
>>>[have no idea whether this will work :-)]
>>
>>It may work but still has the potential performance problem since it
>>loops through the keys and then does a getitem on each key which loops
>>through them again. Not likely to cause problems for a small number of
>>arguments but not ideal :-)
>>
>>
>>>and voila - you've got a dictionary based fieldstorage?
>>>
>>>Anyway, just a few cents from me.
>>>
>>>Grisha
>>>
>>
>>

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by Nicolas Lehuen <ni...@gmail.com>.

Hi,

Speaking of ordered dictionary :

http://www.voidspace.org.uk/python/weblog/arch_d7_2005_11_19.shtml#e140

Why is the ordering so important ? I do understand we need to support
multiple values per field name, but I don't see why ordering is
needed.

Regards,
Nicolas

2005/11/28, David Fraser <da...@sjsoft.com>:
> Gregory (Grisha) Trubetskoy wrote:
>
> >
> >
> > Having looked at the FieldStorage code, I'm guessing the idea was that
> > you parse fields as they come in and append them to a list. This
> > preserves the original order of fields, in case it is needed.
> >
> > I'm not sure that maintaining a dictionary alongside the list is the
> > right thing to do. It might be, but there are some difficult questions
> > to answer -e.g. how costly is a sequential search, and is the code
> > complexity (and fieldstorage code is no picnic to read as it is) worth
> > the speedup?
> >
> > Also while it would speed up retrieval, it will slow down the "write"
> > operation - when a field is added to fieldstorage you now need to
> > append it to the list, AND check whether it exists in the dictionary,
> > then add it there as well.
> >
> > How often do developers access form fields via __getitem__?  I noticed
> > the publisher does not use it - it iterates the list, so nothing would
> > be gained there.
>
> We do it a lot but we copy it into a different dictionary first to get
> exactly the setup we want. But dictionary-style access is a very
> obvious, pythonic way to do it.
> I have a simple 70-line ordereddict implementation which is derived from
> dict and remembers the keys in the order that they were assigned when
> iterating through the list, this may be a way to go for this. It just
> uses a list of keys internally to remember the order, and otherwise is a
> dictionary...
>
> > Also, something else to consider - is there a simple programatic
> > solution that could be documented, e.g. something like
> >
> > my_fs = util.FieldStorage(req)
> >
> > dict_fs = {}
> > dict_fs.update(my_fs)
> >
> > [have no idea whether this will work :-)]
>
> It may work but still has the potential performance problem since it
> loops through the keys and then does a getitem on each key which loops
> through them again. Not likely to cause problems for a small number of
> arguments but not ideal :-)
>
> > and voila - you've got a dictionary based fieldstorage?
> >
> > Anyway, just a few cents from me.
> >
> > Grisha
> >
>
>

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by David Fraser <da...@sjsoft.com>.

Gregory (Grisha) Trubetskoy wrote:

>
>
> Having looked at the FieldStorage code, I'm guessing the idea was that 
> you parse fields as they come in and append them to a list. This 
> preserves the original order of fields, in case it is needed.
>
> I'm not sure that maintaining a dictionary alongside the list is the 
> right thing to do. It might be, but there are some difficult questions 
> to answer -e.g. how costly is a sequential search, and is the code 
> complexity (and fieldstorage code is no picnic to read as it is) worth 
> the speedup?
>
> Also while it would speed up retrieval, it will slow down the "write" 
> operation - when a field is added to fieldstorage you now need to 
> append it to the list, AND check whether it exists in the dictionary, 
> then add it there as well.
>
> How often do developers access form fields via __getitem__?  I noticed 
> the publisher does not use it - it iterates the list, so nothing would 
> be gained there.

We do it a lot but we copy it into a different dictionary first to get 
exactly the setup we want. But dictionary-style access is a very 
obvious, pythonic way to do it.
I have a simple 70-line ordereddict implementation which is derived from 
dict and remembers the keys in the order that they were assigned when 
iterating through the list, this may be a way to go for this. It just 
uses a list of keys internally to remember the order, and otherwise is a 
dictionary...

> Also, something else to consider - is there a simple programatic 
> solution that could be documented, e.g. something like
>
> my_fs = util.FieldStorage(req)
>
> dict_fs = {}
> dict_fs.update(my_fs)
>
> [have no idea whether this will work :-)]

It may work but still has the potential performance problem since it 
loops through the keys and then does a getitem on each key which loops 
through them again. Not likely to cause problems for a small number of 
arguments but not ideal :-)

> and voila - you've got a dictionary based fieldstorage?
>
> Anyway, just a few cents from me.
>
> Grisha
>

Re: [jira] Commented: (MODPYTHON-93) Improve util.FieldStorage efficiency

Posted by "Gregory (Grisha) Trubetskoy" <gr...@apache.org>.


Having looked at the FieldStorage code, I'm guessing the idea was that you 
parse fields as they come in and append them to a list. This preserves 
the original order of fields, in case it is needed.

I'm not sure that maintaining a dictionary alongside the list is the right 
thing to do. It might be, but there are some difficult questions to answer 
-e.g. how costly is a sequential search, and is the code complexity (and 
fieldstorage code is no picnic to read as it is) worth the speedup?

Also while it would speed up retrieval, it will slow down the "write" 
operation - when a field is added to fieldstorage you now need to append 
it to the list, AND check whether it exists in the dictionary, then add it 
there as well.

How often do developers access form fields via __getitem__?  I noticed the 
publisher does not use it - it iterates the list, so nothing would be 
gained there.

Also, something else to consider - is there a simple programatic solution 
that could be documented, e.g. something like

my_fs = util.FieldStorage(req)

dict_fs = {}
dict_fs.update(my_fs)

[have no idea whether this will work :-)]

and voila - you've got a dictionary based fieldstorage?

Anyway, just a few cents from me.

Grisha