You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Brian Pane <br...@cnet.com> on 2003/06/01 05:57:41 UTC
[PATCH] table patch using mergesort Re: Frankentables
Here's a modification of Joe Schaefer's table patch that uses
a mergesort to do apr_table_compress and apr_table_overlap.
This will ensure a worst-case run time of n*log(n) instead
of n^2. However, I'm not sure whether the extra complexity
of the mergesort will hurt the performance on small data
sets. Joe, if you have time to test this patch, can you
let me know how it performs compared to your patch?
Thanks,
Brian
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 24 Jun 2003, Joe Schaefer wrote:
> However, I can see the value in not changing the current semantics
> if people are concerned that the change may cause problems for third
> party modules that rely on the current behavior. Either way, it's good
> to discuss this issue before adopting the callback API, so I can make
> sure the correct behavior, whatever that turns out to be, is both
> documented and tested prior to acceptance.
Thanks for the explanation. :) As long as we're all in agreement that
this needs further discussion, then I'll get out of the way... veto
rescinded.
--Cliff
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Cliff Woolley <jw...@virginia.edu> writes:
> On Mon, 23 Jun 2003, Joe Schaefer wrote:
[...]
> > To get the mod_headers tests to pass, here is a patch to
> > httpd-test/perl-framework/t/modules/headers.t
>
> Whoa, hang on... -1 to changing the test framework. If we've had a
> semantic change in APR and the test suite caught it, that's a bug in APR
> and *not* a bug in the test suite, IMO (unless of course the original
> semantics were "wrong" in some way... but even then a semantic change
> in a library is not good).
>
> Why the semantic change?
Here is the section of headers.t (line 76) in question :
## should 'append' work this way?
## currently, if there are multiple headers
## with the same name, 'append' appends the value
## to the FIRST instance of that header.
if (@expected_value) {
The documentation for apr_table_mergen gives no indication that
such behavior is intentional. Moreover, besides being counterintuitive,
IMO the current merge behavior violates both the letter and spirit of
RFC 2616 Section 4.2:
Multiple message-header fields with the same field-name MAY be present
in a message ... It MUST be possible to combine the multiple header
fields into one "field-name: field-value" pair, without changing the
semantics of the message, by appending each subsequent field-value to
the first, each separated by a comma. The order in which header fields
with the same field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.
The current mergen code takes a subsequent header and effectively
*inserts* it ahead of all previous headers save for the first.
With the callback patch (which Brian hasn't applied yet), mergen
really does append the header to the end, and then combines those
headers into one. The relative ordering of the headers is preserved.
However, I can see the value in not changing the current semantics
if people are concerned that the change may cause problems for third
party modules that rely on the current behavior. Either way, it's good
to discuss this issue before adopting the callback API, so I can make
sure the correct behavior, whatever that turns out to be, is both
documented and tested prior to acceptance.
As far as Brian's current patch goes, it does not change the mergen
semantics. However, it may have an impact on what apr_table_overlap
is "supposed" to do in the case where a multivalued key appears in
the first table, but not the second. The documented behavior for this
case is to do nothing; but the patches under discussion in this thread
no longer have that behavior (apr_table_compress will always result in
a table free of multivalued keys).
--
Joe Schaefer
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Cliff Woolley <jw...@virginia.edu>.
On Mon, 23 Jun 2003, Joe Schaefer wrote:
> OK. I've looked into the test failures, and it looks to
> me like the cause is a semantic change in apr_table_mergen.
> With the callback API, apr_table_merge with preexisting headers
> will merge *all* of them into a single header, whereas the
> current code only merges *the first* occurrence.
>
> To get the mod_headers tests to pass, here is a patch to
> httpd-test/perl-framework/t/modules/headers.t
Whoa, hang on... -1 to changing the test framework. If we've had a
semantic change in APR and the test suite caught it, that's a bug in APR
and *not* a bug in the test suite, IMO (unless of course the original
semantics were "wrong" in some way... but even then a semantic change in a
library is not good).
Why the semantic change?
--Cliff
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:
> This patch is failing several of the mod_headers tests
> in perl-framework. For now, I'll commit a version
> without the merge and copy callbacks. It should be
> easy to add those back in later, by keeping the
> merge-or-copy flag in apr_table_compress and letting
> it call the callbacks based on which flag value is set.
OK. I've looked into the test failures, and it looks to
me like the cause is a semantic change in apr_table_mergen.
With the callback API, apr_table_merge with preexisting headers
will merge *all* of them into a single header, whereas the
current code only merges *the first* occurrence.
To get the mod_headers tests to pass, here is a patch to
httpd-test/perl-framework/t/modules/headers.t
Index: t/modules/headers.t
===================================================================
RCS file: /home/cvspublic/httpd-test/perl-framework/t/modules/headers.t,v
retrieving revision 1.2
diff -u -r1.2 headers.t
--- t/modules/headers.t 18 Sep 2001 15:41:03 -0000 1.2
+++ t/modules/headers.t 23 Jun 2003 04:38:10 -0000
@@ -79,8 +79,10 @@
## with the same name, 'append' appends the value
## to the FIRST instance of that header.
if (@expected_value) {
- $expected_value[0] .= ", $test_value";
-
+ $expected_value = join ', ', @expected_value,
+ $expected_value, $test_value;
+ @expected_value = ();
+ $expected_exists = 1;
} elsif ($expected_value) {
$expected_value .= ", $test_value";
} else {
--
Joe Schaefer
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Brian Pane <br...@cnet.com>.
This patch is failing several of the mod_headers tests
in perl-framework. For now, I'll commit a version
without the merge and copy callbacks. It should be
easy to add those back in later, by keeping the
merge-or-copy flag in apr_table_compress and letting
it call the callbacks based on which flag value is set.
Brian
On Sun, 2003-06-22 at 13:23, Brian Pane wrote:
> On Wed, 2003-06-18 at 08:10, Joe Schaefer wrote:
>
> > I added back the callbacks apreq needs,
> > and moved the table_reindex call inside
> > the dups_found test. Here's the corresponding
> > oprofile data:
>
> Thanks. I'll merge in the version of mergesort that
> has a return value, run some final tests, and commit
> the changes this afternoon.
>
> Brian
>
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Brian Pane <br...@cnet.com>.
On Wed, 2003-06-18 at 08:10, Joe Schaefer wrote:
> I added back the callbacks apreq needs,
> and moved the table_reindex call inside
> the dups_found test. Here's the corresponding
> oprofile data:
Thanks. I'll merge in the version of mergesort that
has a return value, run some final tests, and commit
the changes this afternoon.
Brian
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Joe Schaefer <jo...@sunstarsys.com> writes:
[...]
> I added back the callbacks apreq needs,
> and moved the table_reindex call inside
> the dups_found test.
Oops- I also introduced a bug into the table_mergesort
function by changing its return value to void. Otherwise
it seems to test ok (all perl-framework tests pass).
--
Joe Schaefer
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Joe Schaefer <jo...@sunstarsys.com> writes:
[...]
> BRIAN'S-PATCH
>
> 0000c1e8 559 1.24604 apr_table_get ...
> 0000c4f8 253 0.563952 apr_table_setn ...
> 0000ce94 203 0.452499 table_mergesort ...
> 0000c06c 187 0.416834 apr_table_make ...
> 0000cbd8 179 0.399001 apr_table_addn ...
> 0000d048 138 0.30761 apr_table_compress ...
> 0000ccec 86 0.191699 apr_table_vdo ...
> 0000c9e4 84 0.187241 apr_table_mergen ...
> 0000c160 82 0.182783 table_reindex ...
> 0000c048 71 0.158263 apr_table_elts ...
> 0000c050 62 0.138202 apr_is_empty_table ...
> 0000ccc4 57 0.127056 apr_table_do ...
> 0000c700 54 0.120369 apr_table_unset ...
>
> TOTAL: 2015 4.491549 %
I added back the callbacks apreq needs,
and moved the table_reindex call inside
the dups_found test. Here's the corresponding
oprofile data:
0000c748 561 1.25878 apr_table_get ...
0000ca54 256 0.574416 apr_table_setn ...
0000c2e4 232 0.520565 table_mergesort ...
0000d490 194 0.4353 apr_table_addn ...
0000c134 178 0.399399 apr_table_make ...
0000c4b8 117 0.262526 apr_table_compress ...
0000d0f4 116 0.260282 apr_table_mergen ...
0000d704 85 0.190724 apr_table_vdo ...
0000cc5c 66 0.148092 apr_table_unset ...
0000c0e4 63 0.14136 apr_table_elts ...
0000c0ec 59 0.132385 apr_is_empty_table ...
0000d6dc 53 0.118922 apr_table_do ...
TOTAL: 1980 4.442751 %
apr patch is attached below; the same patch Brian wrote for
httpd-2.0 applies.
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:
> Here's a modification of Joe Schaefer's table patch that uses
> a mergesort to do apr_table_compress and apr_table_overlap.
>
> This will ensure a worst-case run time of n*log(n) instead
> of n^2. However, I'm not sure whether the extra complexity
> of the mergesort will hurt the performance on small data
> sets. Joe, if you have time to test this patch, can you
> let me know how it performs compared to your patch?
Sorry for the delay, Brian. After 3 passes of
% ab -H "Accept-Language: en" -H "Accept-Encoding: gzip" \
-H "Accept-Charset: *" -H "Referer: http://localhost/" \
-H "Cookie: foo=bar" -n 50000 -c 50 http://127.0.0.1:8080/foo
I ran
% op_time -rl /home/joe/apache2/bin/httpd | perl -nal \
-e 'if (/table|overlap/) { $ops += $F[1]; $pct += $F[2]; print }' \
-e 'END { printf "\nTOTAL:\t %d \t %.6f %%\n", $ops, $pct }'
Results:
--------------------------------------------------
CVS HEAD
0000c16c 521 1.16334 apr_table_get ...
0000d0dc 285 0.636374 apr_table_overlap ...
0000c47c 260 0.580552 apr_table_setn ...
0000bff0 260 0.580552 apr_table_make ...
0000cea0 229 0.511332 overlap_hash ...
0000cb5c 190 0.424249 apr_table_addn ...
0000c968 149 0.332701 apr_table_mergen ...
0000c0e4 130 0.290276 table_reindex ...
0000bfd4 84 0.187563 apr_is_empty_table ...
0000bfcc 82 0.183097 apr_table_elts ...
0000cc70 76 0.1697 apr_table_vdo ...
0000cc48 57 0.127275 apr_table_do ...
0000c684 41 0.0915485 apr_table_unset ...
TOTAL: 2364 5.278559 %
--------------------------------------------------
JOE'S-PATCH
0000c5fc 524 1.16865 apr_table_get ...
0000c908 325 0.724832 apr_table_setn ...
0000c3bc 290 0.646773 apr_table_compress ...
0000d344 210 0.468353 apr_table_addn ...
0000c134 177 0.394754 apr_table_make ...
0000cfa8 131 0.292163 apr_table_mergen ...
0000d5b8 101 0.225255 apr_table_vdo ...
0000c0ec 76 0.169499 apr_is_empty_table ...
0000c0e4 66 0.147197 apr_table_elts ...
0000d590 56 0.124894 apr_table_do ...
0000cb10 51 0.113743 apr_table_unset ...
TOTAL: 2007 4.476113 %
--------------------------------------------------
BRIAN'S-PATCH
0000c1e8 559 1.24604 apr_table_get ...
0000c4f8 253 0.563952 apr_table_setn ...
0000ce94 203 0.452499 table_mergesort ...
0000c06c 187 0.416834 apr_table_make ...
0000cbd8 179 0.399001 apr_table_addn ...
0000d048 138 0.30761 apr_table_compress ...
0000ccec 86 0.191699 apr_table_vdo ...
0000c9e4 84 0.187241 apr_table_mergen ...
0000c160 82 0.182783 table_reindex ...
0000c048 71 0.158263 apr_table_elts ...
0000c050 62 0.138202 apr_is_empty_table ...
0000ccc4 57 0.127056 apr_table_do ...
0000c700 54 0.120369 apr_table_unset ...
TOTAL: 2015 4.491549 %
Looks like the two patches are virtually identical overall, and your
mergesort patch handles the worst case scenario as well as current cvs
does.
Reagarding the worst-case scenario: both apache-1.3 and httpd-2
allow 100 8KB header lines, and neither one runs any sanity checks on the
header's name length. Conceivably each of the 100 header names could
be 8KB long, perhaps differing only in the last character. AFAICT, the
simplest way to reduce the effectiveness of a potential DoS attack on the
merging algorithm would be to add a length check to httpd-2/protocol.c
around line 780:
if (!(value = strchr(last_field, ':')) /* Find ':' within */
|| value - last_field > 100 ) { /* first 100 chars or */
r->status = HTTP_BAD_REQUEST /* abort bad request */
--
Joe Schaefer
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:
[...]
> But if the callback approach works better for apreq-2,
> then I don't have any objections to using callbacks in
> the final patch.
Cool- that gives me time to polish it some more :-).
The simple API I offered may need improvement, so that
modperl-2's APR::Table glue will work well for apreq-2's
perl API (Apache::Request & Apache::Cookie).
FYI: In apreq-2, we always stick metadata right before
the table value's character string. It's important
for us to have some sort of copy/create/merge functions
which do sensible things with the metadata whenever
the table code needs to modify a value.
--
Joe Schaefer
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Brian Pane <br...@cnet.com>.
On Sun, 2003-06-01 at 19:41, Joe Schaefer wrote:
> Brian Pane <br...@cnet.com> writes:
>
> > Here's a modification of Joe Schaefer's table patch that uses
> > a mergesort to do apr_table_compress and apr_table_overlap.
> >
> > This will ensure a worst-case run time of n*log(n) instead
> > of n^2. However, I'm not sure whether the extra complexity
> > of the mergesort will hurt the performance on small data
> > sets. Joe, if you have time to test this patch, can you
> > let me know how it performs compared to your patch?
>
> Certainly, Brian! I'll post my oprofile data as soon
> as I get a chance, perhaps in a day or so. OTOH for apreq-2,
> the most important part of my original patch was the
> internal use of the copy/merge callbacks.
Thanks!
> Is there some technical reason you omitted them from this
> patch, or are you leaving out the callback API until there's
> consensus on the apr_table_overlap implementation?
I used the merge/set flag instead of the callbacks mainly
because it seemed like a good idea at the time to use the
same flag convention for apr_table_compress as for
apr_table_overlap. But if the callback approach works
better for apreq-2, then I don't have any objections to
using callbacks in the final patch.
Brian
Re: [PATCH] table patch using mergesort Re: Frankentables
Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:
> Here's a modification of Joe Schaefer's table patch that uses
> a mergesort to do apr_table_compress and apr_table_overlap.
>
> This will ensure a worst-case run time of n*log(n) instead
> of n^2. However, I'm not sure whether the extra complexity
> of the mergesort will hurt the performance on small data
> sets. Joe, if you have time to test this patch, can you
> let me know how it performs compared to your patch?
Certainly, Brian! I'll post my oprofile data as soon
as I get a chance, perhaps in a day or so. OTOH for apreq-2,
the most important part of my original patch was the
internal use of the copy/merge callbacks.
Is there some technical reason you omitted them from this
patch, or are you leaving out the callback API until there's
consensus on the apr_table_overlap implementation?
--
Joe Schaefer