You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Mladen Turk <mt...@mappingsoft.com> on 2001/11/08 21:47:35 UTC

[PATCH] apr_table WAS: What to do about tables?


> -----Original Message-----
> From: Mladen Turk [mailto:mturk@mappingsoft.com]
> Sent: 7. studeni 2001 11:23
> To: dev@httpd.apache.org
> Subject: RE: What to do about tables?
> 
> I'm currently working to extend the apr_table in such way
(apr_htable).
> Iterating through table elements will preserve the element index
> approach, but getting and setting elements to table will have almost
> constant times regardless of table size.
> 

Here is the patch or should I say enhancement to apr_table.
At least I think it's an enhancement :)

I've treated the apr_table as database table, having columns and rows.
When you think of apr_table that way, you can use the same technique
that the databases use to quickly access particular row, using some sort
of indexing.

For indexing I've used the simple hashing scheme and the particular hash
bucket only references the table's ordinal number.

I've made the patch to apr_table itself, and didn't need to change the
single line of code in httpd.
That is one approach. Other would be to use the apr_htable, wich
basically encapsulates the apr_table (leaving it as is), but that would
require serious httpd recoding where applicable (well, mostly function
renaming).

I'm in favor of first solution, and that is modifying apr_table.


MT.

Re: [PATCH] apr_table WAS: What to do about tables?

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.

From: "Mladen Turk" <mt...@mappingsoft.com>
Sent: Monday, November 12, 2001 11:39 AM


> Someone has the idea how to measure those precisely?

Are you testing loopback, or from another machine?  Local testing is
generally faulty, and on Win32, it's doubly so (since there are a number
of framing issues with MS's own loopback device.)

Bill

RE: [PATCH] apr_table WAS: What to do about tables?

Posted by Mladen Turk <mt...@mappingsoft.com>.


> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: Sunday, November 11, 2001 8:15 PM
> To: dev@httpd.apache.org
> Subject: Re: [PATCH] apr_table WAS: What to do about tables?
>
> My +1 would be non-binding, but IMHO it makes sense to keep
> optimizing your new version. With the changes to make
> apr_table_elts return const (my patch and Justin's update
> to it), it will be possible to drop in an index-based
> implementation of tables in the future without breaking
> backward compatibility.
>
> Based on this latest set of benchmarks, my suggestion is:
>   * Tune the index code until the sequential set and
>     random get cases can outperform the old code.  (These
>     are both common cases in Apache.)
>   * Then plug the new code into a copy of the httpd to
>     see the effect on overall performance.
>

Here is the drop-in replacement for apr_table that uses hash indexes to
table entries.

I've tested the code to the current httpd from cvs.
Well, the performance gain isn't as high as I expected (or my testing
approach is wrong).

Using ab the total response is aroud 1% or more faster then original, but
the connection, processing and waiting have 16 % less values (If I
calculated that correctly).

Someone has the idea how to measure those precisely?

MT.

Re: [PATCH] apr_table WAS: What to do about tables?

Posted by Brian Pane <bp...@pacbell.net>.

Mladen Turk wrote:
[...]

>I'm posting the new table indexing schema that doesn't require reindexing on
>table_set when duplicate keys found, or when resizing table.
>
>The table implementation is named apr_stable, just to be able to compare
>with original. I've focused on add/set/get so these are only implemented for
>now.
>I'm sending two test programs, one for apr_table, and other for the new one.
>
>Here are the results of benchmarks (useconds)
>		old		new
>seq add		50.072		60.087
>seq set		22.782.760	60.086
>seq get		370.533		10.015
>rnd add		40.058		40.057
>rnd set		28.370.795	3.655.256
>rnd get		26.628.289	40.058
>
>But... It would be nice to benchmark those using some real data.
>
>Do I have some +1 to continue working on that?
>

My +1 would be non-binding, but IMHO it makes sense to keep
optimizing your new version. With the changes to make
apr_table_elts return const (my patch and Justin's update
to it), it will be possible to drop in an index-based
implementation of tables in the future without breaking
backward compatibility.

Based on this latest set of benchmarks, my suggestion is:
  * Tune the index code until the sequential set and
    random get cases can outperform the old code.  (These
    are both common cases in Apache.)
  * Then plug the new code into a copy of the httpd to
    see the effect on overall performance.

--Brian

RE: [PATCH] apr_table WAS: What to do about tables?

Posted by Mladen Turk <mt...@mappingsoft.com>.


> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: Friday, November 09, 2001 4:27 PM
> To: dev@httpd.apache.org
> Subject: Re: [PATCH] apr_table WAS: What to do about tables?
>
> Here's the distribution of CPU time to these functions in a recent CVS
> snapshot of 2.0:
>
> apr_table_set/apr_table_setn:  3.0% of usr CPU
> apr_table_get:                 2.0%
> apr_table_add:                 0.24%
>
> Based on these numbers, if you double the cost of apr_table_set and
> reduce the cost of apr_table_get by a factor of 120, the httpd overall
> will get *slower*.
>
> If you can improve the performance of updates in your implementation,
> though, then the table-with-index approach would be a big win.

I'm posting the new table indexing schema that doesn't require reindexing on
table_set when duplicate keys found, or when resizing table.

The table implementation is named apr_stable, just to be able to compare
with original. I've focused on add/set/get so these are only implemented for
now.
I'm sending two test programs, one for apr_table, and other for the new one.

Here are the results of benchmarks (useconds)
		old		new
seq add		50.072		60.087
seq set		22.782.760	60.086
seq get		370.533		10.015
rnd add		40.058		40.057
rnd set		28.370.795	3.655.256
rnd get		26.628.289	40.058

But... It would be nice to benchmark those using some real data.

Do I have some +1 to continue working on that?

MT.

Re: [PATCH] apr_table WAS: What to do about tables?

Posted by Brian Pane <bp...@pacbell.net>.

Mladen Turk wrote:
[...]

>>Based on my really quick reading of your new version of apr_tables.c,
>>the code looks reasonable.  My main worry is that the overhead of
>>the hash table maintenance, including the function call overhead and
>>the need to reindex every time the table is expanded, may offset the
>>performance benefits of hashing.  Do you have any benchmark data
>>to compare the performance of this implementation with the current
>>apr_table_t?
>>
>
>				old apr_table	my implementation
>(microseconds)
>
>add    10000 keys		   40.057			 80.113
>get    10000 keys       7.220.166	      	 60.084
>
>The keys are in the form KEY0...KEY9999
>
>So I agree that you have almost double time to insert the keys, but
>getting values from table shows 120 times performance gain.
>

Thanks for the data.  Does your "add" number represent apr_table_add
or apr_table_set?  I'll assume it's apr_table_set for the following
estimation.

Here's the distribution of CPU time to these functions in a recent CVS
snapshot of 2.0:

apr_table_set/apr_table_setn:  3.0% of usr CPU
apr_table_get:                 2.0%
apr_table_add:                 0.24%

Based on these numbers, if you double the cost of apr_table_set and
reduce the cost of apr_table_get by a factor of 120, the httpd overall
will get *slower*.

If you can improve the performance of updates in your implementation,
though, then the table-with-index approach would be a big win.

--Brian

P.S.: You may have tried this one already, but another benchmark case
that I think would be valuable is to set the same key value 10000 times
with apr_table_addn, to simulate the DoS attack that can affect table
implementations that handle the multiple-values-per-key case too slowly.

RE: [PATCH] apr_table WAS: What to do about tables?

Posted by Mladen Turk <mt...@mappingsoft.com>.


> -----Original Message-----
> From: Brian Pane [mailto:bpane@pacbell.net]
> Sent: 8. studeni 2001 21:38
> To: dev@httpd.apache.org
> Subject: Re: [PATCH] apr_table WAS: What to do about tables?
> 
> Mladen Turk wrote:
> 
> [...]
> 
> >I've treated the apr_table as database table, having columns and
rows.
> >When you think of apr_table that way, you can use the same technique
> >that the databases use to quickly access particular row, using some
sort
> >of indexing.
> >
> Also, why add another hash table implementation?  Apr_hash_t won't
> quite work here, because it doesn't have efficient support for
> case-insensitive string keys, but I'd rather add that support
> into apr_hash_t than add another version of hash tables to APR.
>
The current apr hash table is inappropriate for such a design and too
much overhead. It stores the key/value data pairs by itself.
If you look deeply in my code it's not a true hash table, but rather an
array of indexes that are accessed using hashing scheme.

> Based on my really quick reading of your new version of apr_tables.c,
> the code looks reasonable.  My main worry is that the overhead of
> the hash table maintenance, including the function call overhead and
> the need to reindex every time the table is expanded, may offset the
> performance benefits of hashing.  Do you have any benchmark data
> to compare the performance of this implementation with the current
> apr_table_t?

				old apr_table	my implementation
(microseconds)

add    10000 keys		   40.057			 80.113
get    10000 keys       7.220.166	      	 60.084

The keys are in the form KEY0...KEY9999

So I agree that you have almost double time to insert the keys, but
getting values from table shows 120 times performance gain.

There are number of ways to further optimize my code, for example, the
simple table scan would beet any indexing technique for a small number
of items. Other would be to optimize reindexing, or expressively do an
indexing after substantial table update (for example after reading
headers or mime types).

MT.

Re: [PATCH] apr_table WAS: What to do about tables?

Posted by Brian Pane <bp...@pacbell.net>.

Mladen Turk wrote:

[...]

>I've treated the apr_table as database table, having columns and rows.
>When you think of apr_table that way, you can use the same technique
>that the databases use to quickly access particular row, using some sort
>of indexing.
>

I like the concept of adding a fast index as an internal optimization
to apr_table_t, but I have some concerns about the implementation:

 From apr_table_t.h:

>/** the hash table bucket abstract data type */
>+typedef struct apr_htable_bucket_t apr_htable_bucket_t;
>+
>+struct apr_htable_bucket_t {
>+    apr_htable_bucket_t *next;
>+    int hash;
>+    int elt;
>+};
>

I propose moving the definition of this struct outside of the
public .h file, so nobody can write code that violates the
abstraction.

Also, why add another hash table implementation?  Apr_hash_t won't
quite work here, because it doesn't have efficient support for
case-insensitive string keys, but I'd rather add that support
into apr_hash_t than add another version of hash tables to APR.

With this design, where correctness depends upon the array and
the hash table within the apr_table_t staying synchronized, I'd
feel much more comfortable if the table_elts() macro were modified
to return a const array.  I think all the uses of this macro within
Apache treat the output as const anyway.

Based on my really quick reading of your new version of apr_tables.c,
the code looks reasonable.  My main worry is that the overhead of
the hash table maintenance, including the function call overhead and
the need to reindex every time the table is expanded, may offset the
performance benefits of hashing.  Do you have any benchmark data
to compare the performance of this implementation with the current
apr_table_t?

--Brian