You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Brian Pane <br...@cnet.com> on 2003/06/01 05:57:41 UTC

[PATCH] table patch using mergesort Re: Frankentables

Here's a modification of Joe Schaefer's table patch that uses
a mergesort to do apr_table_compress and apr_table_overlap.

This will ensure a worst-case run time of n*log(n) instead
of n^2.  However, I'm not sure whether the extra complexity
of the mergesort will hurt the performance on small data
sets.  Joe, if you have time to test this patch, can you
let me know how it performs compared to your patch?

Thanks,
Brian


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 24 Jun 2003, Joe Schaefer wrote:

> However, I can see the value in not changing the current semantics
> if people are concerned that the change may cause problems for third
> party modules that rely on the current behavior.  Either way, it's good
> to discuss this issue before adopting the callback API, so I can make
> sure the correct behavior, whatever that turns out to be, is both
> documented and tested prior to acceptance.

Thanks for the explanation.  :)  As long as we're all in agreement that
this needs further discussion, then I'll get out of the way... veto
rescinded.

--Cliff

Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Cliff Woolley <jw...@virginia.edu> writes:

> On Mon, 23 Jun 2003, Joe Schaefer wrote:

[...]

> > To get the mod_headers tests to pass, here is a patch to
> > httpd-test/perl-framework/t/modules/headers.t
> 
> Whoa, hang on... -1 to changing the test framework.  If we've had a
> semantic change in APR and the test suite caught it, that's a bug in APR
> and *not* a bug in the test suite, IMO (unless of course the original
> semantics were "wrong" in some way... but even then a semantic change
> in a library is not good).
> 
> Why the semantic change?

Here is the section of headers.t (line 76) in question :

                ## should 'append' work this way?
                ## currently, if there are multiple headers
                ## with the same name, 'append' appends the value
                ## to the FIRST instance of that header.
                if (@expected_value) {


The documentation for apr_table_mergen gives no indication that
such behavior is intentional.  Moreover, besides being counterintuitive, 
IMO the current merge behavior violates both the letter and spirit of
RFC 2616 Section 4.2:

  Multiple message-header fields with the same field-name MAY be present 
  in a message ... It MUST be possible to combine the multiple header 
  fields into one "field-name: field-value" pair, without changing the 
  semantics of the message, by appending each subsequent field-value to 
  the first, each separated by a comma. The order in which header fields 
  with the same field-name are received is therefore significant to the 
  interpretation of the combined field value, and thus a proxy MUST NOT 
  change the order of these field values when a message is forwarded.

The current mergen code takes a subsequent header and effectively
*inserts* it ahead of all previous headers save for the first.
With the callback patch (which Brian hasn't applied yet), mergen
really does append the header to the end, and then combines those
headers into one.  The relative ordering of the headers is preserved.

However, I can see the value in not changing the current semantics
if people are concerned that the change may cause problems for third 
party modules that rely on the current behavior.  Either way, it's good 
to discuss this issue before adopting the callback API, so I can make 
sure the correct behavior, whatever that turns out to be, is both 
documented and tested prior to acceptance.

As far as Brian's current patch goes, it does not change the  mergen 
semantics.  However, it may have an impact on what apr_table_overlap 
is "supposed" to do in the case where a multivalued key appears in 
the first table, but not the second.  The documented behavior for this 
case is to do nothing; but the patches under discussion in this thread 
no longer have that behavior (apr_table_compress will always result in 
a table free of multivalued keys).

-- 
Joe Schaefer


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Cliff Woolley <jw...@virginia.edu>.
On Mon, 23 Jun 2003, Joe Schaefer wrote:

> OK.  I've looked into the test failures, and it looks to
> me like the cause is a semantic change in apr_table_mergen.
> With the callback API, apr_table_merge with preexisting headers
> will merge *all* of them into a single header, whereas the
> current code only merges *the first* occurrence.
>
> To get the mod_headers tests to pass, here is a patch to
> httpd-test/perl-framework/t/modules/headers.t

Whoa, hang on... -1 to changing the test framework.  If we've had a
semantic change in APR and the test suite caught it, that's a bug in APR
and *not* a bug in the test suite, IMO (unless of course the original
semantics were "wrong" in some way... but even then a semantic change in a
library is not good).

Why the semantic change?

--Cliff

Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:

> This patch is failing several of the mod_headers tests
> in perl-framework.  For now, I'll commit a version
> without the merge and copy callbacks.  It should be
> easy to add those back in later, by keeping the
> merge-or-copy flag in apr_table_compress and letting
> it call the callbacks based on which flag value is set.

OK.  I've looked into the test failures, and it looks to
me like the cause is a semantic change in apr_table_mergen.
With the callback API, apr_table_merge with preexisting headers
will merge *all* of them into a single header, whereas the
current code only merges *the first* occurrence.  

To get the mod_headers tests to pass, here is a patch to 
httpd-test/perl-framework/t/modules/headers.t

Index: t/modules/headers.t
===================================================================
RCS file: /home/cvspublic/httpd-test/perl-framework/t/modules/headers.t,v
retrieving revision 1.2
diff -u -r1.2 headers.t
--- t/modules/headers.t 18 Sep 2001 15:41:03 -0000      1.2
+++ t/modules/headers.t 23 Jun 2003 04:38:10 -0000
@@ -79,8 +79,10 @@
                 ## with the same name, 'append' appends the value
                 ## to the FIRST instance of that header.
                 if (@expected_value) {
-                    $expected_value[0] .= ", $test_value";
-
+                    $expected_value = join ', ', @expected_value, 
+                                                 $expected_value, $test_value;
+                    @expected_value = ();
+                    $expected_exists = 1;
                 } elsif ($expected_value) {
                     $expected_value .= ", $test_value";
                 } else {


-- 
Joe Schaefer


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Brian Pane <br...@cnet.com>.
This patch is failing several of the mod_headers tests
in perl-framework.  For now, I'll commit a version
without the merge and copy callbacks.  It should be
easy to add those back in later, by keeping the
merge-or-copy flag in apr_table_compress and letting
it call the callbacks based on which flag value is set.

Brian

On Sun, 2003-06-22 at 13:23, Brian Pane wrote:
> On Wed, 2003-06-18 at 08:10, Joe Schaefer wrote:
> 
> > I added back the callbacks apreq needs,
> > and moved the table_reindex call inside
> > the dups_found test.  Here's the corresponding
> > oprofile data:
> 
> Thanks.  I'll merge in the version of mergesort that
> has a return value, run some final tests, and commit
> the changes this afternoon.
> 
> Brian
> 


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Brian Pane <br...@cnet.com>.
On Wed, 2003-06-18 at 08:10, Joe Schaefer wrote:

> I added back the callbacks apreq needs,
> and moved the table_reindex call inside
> the dups_found test.  Here's the corresponding
> oprofile data:

Thanks.  I'll merge in the version of mergesort that
has a return value, run some final tests, and commit
the changes this afternoon.

Brian



Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Joe Schaefer <jo...@sunstarsys.com> writes:

[...]

> I added back the callbacks apreq needs,
> and moved the table_reindex call inside
> the dups_found test. 

Oops- I also introduced a bug into the table_mergesort 
function by changing its return value to void.  Otherwise
it seems to test ok (all perl-framework tests pass).

-- 
Joe Schaefer


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Joe Schaefer <jo...@sunstarsys.com> writes:

[...]

> BRIAN'S-PATCH
> 
> 0000c1e8 559      1.24604     apr_table_get           ...
> 0000c4f8 253      0.563952    apr_table_setn          ...
> 0000ce94 203      0.452499    table_mergesort         ...
> 0000c06c 187      0.416834    apr_table_make          ...
> 0000cbd8 179      0.399001    apr_table_addn          ...
> 0000d048 138      0.30761     apr_table_compress      ...
> 0000ccec 86       0.191699    apr_table_vdo           ...
> 0000c9e4 84       0.187241    apr_table_mergen        ...
> 0000c160 82       0.182783    table_reindex           ...
> 0000c048 71       0.158263    apr_table_elts          ...
> 0000c050 62       0.138202    apr_is_empty_table      ...
> 0000ccc4 57       0.127056    apr_table_do            ...
> 0000c700 54       0.120369    apr_table_unset         ...
> 
> TOTAL:	 2015 	  4.491549 %

I added back the callbacks apreq needs,
and moved the table_reindex call inside
the dups_found test.  Here's the corresponding
oprofile data:

0000c748 561      1.25878     apr_table_get           ...
0000ca54 256      0.574416    apr_table_setn          ...
0000c2e4 232      0.520565    table_mergesort         ...
0000d490 194      0.4353      apr_table_addn          ...
0000c134 178      0.399399    apr_table_make          ...
0000c4b8 117      0.262526    apr_table_compress      ...
0000d0f4 116      0.260282    apr_table_mergen        ...
0000d704 85       0.190724    apr_table_vdo           ...
0000cc5c 66       0.148092    apr_table_unset         ...
0000c0e4 63       0.14136     apr_table_elts          ...
0000c0ec 59       0.132385    apr_is_empty_table      ...
0000d6dc 53       0.118922    apr_table_do            ...

TOTAL:	 1980 	  4.442751 %

apr patch is attached below; the same patch Brian wrote for 
httpd-2.0 applies.


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:

> Here's a modification of Joe Schaefer's table patch that uses
> a mergesort to do apr_table_compress and apr_table_overlap.
> 
> This will ensure a worst-case run time of n*log(n) instead
> of n^2.  However, I'm not sure whether the extra complexity
> of the mergesort will hurt the performance on small data
> sets.  Joe, if you have time to test this patch, can you
> let me know how it performs compared to your patch?

Sorry for the delay, Brian.  After 3 passes of

  % ab -H "Accept-Language: en" -H "Accept-Encoding: gzip" \
  -H "Accept-Charset: *" -H "Referer: http://localhost/"   \
  -H "Cookie: foo=bar" -n 50000 -c 50 http://127.0.0.1:8080/foo

I ran

  % op_time -rl /home/joe/apache2/bin/httpd | perl -nal             \
  -e 'if (/table|overlap/) { $ops += $F[1]; $pct += $F[2]; print }' \
  -e 'END { printf "\nTOTAL:\t %d \t  %.6f %%\n", $ops, $pct }'

Results:
--------------------------------------------------
CVS HEAD

0000c16c 521      1.16334     apr_table_get           ...
0000d0dc 285      0.636374    apr_table_overlap       ...
0000c47c 260      0.580552    apr_table_setn          ...
0000bff0 260      0.580552    apr_table_make          ...
0000cea0 229      0.511332    overlap_hash            ...
0000cb5c 190      0.424249    apr_table_addn          ...
0000c968 149      0.332701    apr_table_mergen        ...
0000c0e4 130      0.290276    table_reindex           ...
0000bfd4 84       0.187563    apr_is_empty_table      ...
0000bfcc 82       0.183097    apr_table_elts          ...
0000cc70 76       0.1697      apr_table_vdo           ...
0000cc48 57       0.127275    apr_table_do            ...
0000c684 41       0.0915485   apr_table_unset         ...

TOTAL:	 2364 	  5.278559 %

--------------------------------------------------
JOE'S-PATCH

0000c5fc 524      1.16865     apr_table_get           ...
0000c908 325      0.724832    apr_table_setn          ...
0000c3bc 290      0.646773    apr_table_compress      ...
0000d344 210      0.468353    apr_table_addn          ...
0000c134 177      0.394754    apr_table_make          ...
0000cfa8 131      0.292163    apr_table_mergen        ...
0000d5b8 101      0.225255    apr_table_vdo           ...
0000c0ec 76       0.169499    apr_is_empty_table      ...
0000c0e4 66       0.147197    apr_table_elts          ...
0000d590 56       0.124894    apr_table_do            ...
0000cb10 51       0.113743    apr_table_unset         ...

TOTAL:	 2007 	  4.476113 %

--------------------------------------------------
BRIAN'S-PATCH

0000c1e8 559      1.24604     apr_table_get           ...
0000c4f8 253      0.563952    apr_table_setn          ...
0000ce94 203      0.452499    table_mergesort         ...
0000c06c 187      0.416834    apr_table_make          ...
0000cbd8 179      0.399001    apr_table_addn          ...
0000d048 138      0.30761     apr_table_compress      ...
0000ccec 86       0.191699    apr_table_vdo           ...
0000c9e4 84       0.187241    apr_table_mergen        ...
0000c160 82       0.182783    table_reindex           ...
0000c048 71       0.158263    apr_table_elts          ...
0000c050 62       0.138202    apr_is_empty_table      ...
0000ccc4 57       0.127056    apr_table_do            ...
0000c700 54       0.120369    apr_table_unset         ...

TOTAL:	 2015 	  4.491549 %


Looks like the two patches are virtually identical overall, and your 
mergesort patch handles the worst case scenario as well as current cvs 
does.

Reagarding the worst-case scenario: both apache-1.3 and httpd-2
allow 100 8KB header lines, and neither one runs any sanity checks on the 
header's name length.  Conceivably each of the 100 header names could 
be 8KB long, perhaps differing only in the last character.  AFAICT, the 
simplest way to reduce the effectiveness of a potential DoS attack on the 
merging algorithm would be to add a length check to httpd-2/protocol.c 
around line 780:

  if (!(value = strchr(last_field, ':')) /* Find ':' within */
        || value - last_field > 100 ) {  /* first 100 chars or */
      r->status = HTTP_BAD_REQUEST       /* abort bad request */

-- 
Joe Schaefer


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:

[...]

> But if the callback approach works better for apreq-2, 
> then I don't have any objections to using callbacks in
> the final patch.

Cool- that gives me time to polish it some more :-).
The simple API I offered may need improvement, so that
modperl-2's APR::Table glue will work well for apreq-2's 
perl API (Apache::Request & Apache::Cookie).

FYI: In apreq-2, we always stick metadata right before
the table value's character string.  It's important 
for us to have some sort of copy/create/merge functions 
which do sensible things with the metadata whenever
the table code needs to modify a value.

-- 
Joe Schaefer


Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Brian Pane <br...@cnet.com>.
On Sun, 2003-06-01 at 19:41, Joe Schaefer wrote:
> Brian Pane <br...@cnet.com> writes:
> 
> > Here's a modification of Joe Schaefer's table patch that uses
> > a mergesort to do apr_table_compress and apr_table_overlap.
> > 
> > This will ensure a worst-case run time of n*log(n) instead
> > of n^2.  However, I'm not sure whether the extra complexity
> > of the mergesort will hurt the performance on small data
> > sets.  Joe, if you have time to test this patch, can you
> > let me know how it performs compared to your patch?
> 
> Certainly, Brian! I'll post my oprofile data as soon
> as I get a chance, perhaps in a day or so. OTOH for apreq-2, 
> the most important part of my original patch was the 
> internal use of the copy/merge callbacks.

Thanks!

> Is there some technical reason you omitted them from this 
> patch, or are you leaving out the callback API until there's 
> consensus on the apr_table_overlap implementation?

I used the merge/set flag instead of the callbacks mainly
because it seemed like a good idea at the time to use the
same flag convention for apr_table_compress as for
apr_table_overlap.  But if the callback approach works
better for apreq-2, then I don't have any objections to
using callbacks in the final patch.

Brian



Re: [PATCH] table patch using mergesort Re: Frankentables

Posted by Joe Schaefer <jo...@sunstarsys.com>.
Brian Pane <br...@cnet.com> writes:

> Here's a modification of Joe Schaefer's table patch that uses
> a mergesort to do apr_table_compress and apr_table_overlap.
> 
> This will ensure a worst-case run time of n*log(n) instead
> of n^2.  However, I'm not sure whether the extra complexity
> of the mergesort will hurt the performance on small data
> sets.  Joe, if you have time to test this patch, can you
> let me know how it performs compared to your patch?

Certainly, Brian! I'll post my oprofile data as soon
as I get a chance, perhaps in a day or so. OTOH for apreq-2, 
the most important part of my original patch was the 
internal use of the copy/merge callbacks.

Is there some technical reason you omitted them from this 
patch, or are you leaving out the callback API until there's 
consensus on the apr_table_overlap implementation?

-- 
Joe Schaefer