You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by am...@apache.org on 2003/01/03 02:59:10 UTC

cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

amyroh      2003/01/02 17:59:09

  Modified:    catalina/src/share/org/apache/catalina/core
                        StandardServer.java
  Log:
  Fix for bugzilla 15762.
  
  Revision  Changes    Path
  1.33      +14 -6     jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core/StandardServer.java
  
  Index: StandardServer.java
  ===================================================================
  RCS file: /home/cvs/jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core/StandardServer.java,v
  retrieving revision 1.32
  retrieving revision 1.33
  diff -u -r1.32 -r1.33
  --- StandardServer.java	11 Sep 2002 14:19:33 -0000	1.32
  +++ StandardServer.java	3 Jan 2003 01:59:08 -0000	1.33
  @@ -824,7 +824,15 @@
               } else if (c == '"') {
                   filtered.append(""");
               } else if (c == '&') {
  -                filtered.append("&");
  +                char s1 = input.charAt(i+3);
  +                char s2 = input.charAt(i+4);
  +                char s3 = input.charAt(i+5);
  +                if (((s1 == ';') || (s2 == ';')) || (s3 == ';')) {
  +                    // do not convert if it's already in converted form
  +                    filtered.append(c);
  +                } else {
  +                    filtered.append("&");
  +                }
               } else {
                   filtered.append(c);
               }
  @@ -1822,7 +1830,7 @@
                       writer.print(' ');
                   }
                   writer.print("<value>");
  -                writer.print(value);
  +                writer.print(convertStr(value));
                   writer.println("</value>");
                   for (int j = 0; j < indent + 2; j++) {
                       writer.print(' ');
  
  
  

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by "Jerome Lacoste (Frisurf)" <la...@frisurf.no>.
On Sun, 2003-01-05 at 18:40, Remy Maucherat wrote:
> Costin Manolache wrote:
> > Great ! 
> 
> If you could come up with a better name for the "substract" method ;-)
> It's supposed to be the opposite of append.

I found this:
>>From The Collaborative International Dictionary of English v.0.44
[gcide]:

  Disappendent \Dis`ap*pend"ent\, a.
     Freed from a former connection or dependence; disconnected.
     [R.]
     [1913 Webster]

But I don't like it much.

untie() neither. Perhaps changing append() would make it easier to find
an opposite ?

J


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by Remy Maucherat <re...@apache.org>.
Costin Manolache wrote:
> Remy Maucherat wrote:
> 
> 
>>Costin Manolache wrote:
>>
>>>Great !
>>
>>If you could come up with a better name for the "substract" method ;-)
>>It's supposed to be the opposite of append.
>>
>>The optimization works good. It should help WebDAV as well as web
>>services, as long as they use the Tomcat reader.
> 
> 
> WebDAV: is there any reason to keep our webdav servlet instead of just
> bundling jakarta-slide ? Can jakarta-slide use the tomcat-specific 
> optimizations ( for example using some factory or hook ) ? 

Well, there's stuff to do to be able to do that, since Slide is quite 
complex (and fairly unoptimized). However, it's modular, so modules 
could be written to embed it (and then we could replace the WebDAV 
servlet). We'll see how it goes.

I think we could end up packaging it as a module, like other J2EE 
related components.

>>Hopefully, I'll be able to write the new mapper soon, 
> 
> 
> My wish list ( if possible ):
>  
> - I think the new mapper should be a "global" mapper - i.e. it should
> handle all aspects of the mapping, from vhost and aliases, all contexts
> and up to the servlet wrapper.
> This would allow more optimizations ( a tree or some other tricks ) than
> the current chain of 3-4 hashtable mappers ( one for host, one for ctx, 
> one for servlet ). 

Yes, I reckon this isn't too efficient. I'll see what I can do.
As for the implementation, a tree would scale better, but a more usual 
char based "dumb" solution is far simpler, and may perform well enough. 
I think I'll start with that, and see what OptimizeIt says.
More importantly, I'll try to use the mapper for request dispatcher 
mapping also, so it can't be too unified (request dispatcher performance 
isn't too good right now, and mapping is a big part of the problem).

BTW, I'd like to have other developers understand that all String 
operations are really *bad*, not just concatenation, even trim(), 
toLowerCase() and other "simpler" operations.

> - Add/remove mapping should be propagated via coyote actions - someday
> jk may intercept the events and inform the native side when a new webapp
> is added.

Yes, I understand.

> - there is an interesting optimization in 3.3 - once a jsp is compiled,
> a prefix mapping is added and the behavior is identical with precompiled
> jsps ( i.e. no jsp servlet or extra overhead - it is a regular servlet ).
> ( the reloading checks are a different story - they can be handled by the
> generated servlet or some other module )

The overhead of the Jasper servlet itself is quite low right now. The 
biggest operation is a hashtable lookup, I think.
It's a nice trick, though.

Remy


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by Costin Manolache <cm...@yahoo.com>.
Remy Maucherat wrote:

> Costin Manolache wrote:
>> Great !
> 
> If you could come up with a better name for the "substract" method ;-)
> It's supposed to be the opposite of append.
> 
> The optimization works good. It should help WebDAV as well as web
> services, as long as they use the Tomcat reader.

WebDAV: is there any reason to keep our webdav servlet instead of just
bundling jakarta-slide ? Can jakarta-slide use the tomcat-specific 
optimizations ( for example using some factory or hook ) ? 
 
> Hopefully, I'll be able to write the new mapper soon, 

My wish list ( if possible ):
 
- I think the new mapper should be a "global" mapper - i.e. it should
handle all aspects of the mapping, from vhost and aliases, all contexts
and up to the servlet wrapper.
This would allow more optimizations ( a tree or some other tricks ) than
the current chain of 3-4 hashtable mappers ( one for host, one for ctx, 
one for servlet ). 

- Add/remove mapping should be propagated via coyote actions - someday
jk may intercept the events and inform the native side when a new webapp
is added.

- there is an interesting optimization in 3.3 - once a jsp is compiled,
a prefix mapping is added and the behavior is identical with precompiled
jsps ( i.e. no jsp servlet or extra overhead - it is a regular servlet ).
( the reloading checks are a different story - they can be handled by the
generated servlet or some other module )

Costin




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by Chris Brown <br...@reflexe.fr>.
> If you could come up with a better name for the "substract" method ;-)
> It's supposed to be the opposite of append.

prepend() ?


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by Remy Maucherat <re...@apache.org>.
Costin Manolache wrote:
> Great ! 

If you could come up with a better name for the "substract" method ;-)
It's supposed to be the opposite of append.

The optimization works good. It should help WebDAV as well as web 
services, as long as they use the Tomcat reader.

Hopefully, I'll be able to write the new mapper soon, and TC 5 should 
easily beat 4.1 in benchmarks.

Remy


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [5.0] Input optimization

Posted by Costin Manolache <cm...@yahoo.com>.
Great ! 

Costin

Remy Maucherat wrote:

> I've committed input optimization similar to the OutputBuffer used in
> the Coyote adapter. It appears to work ok (although
> BufferedWriter.readLine still needs to be implemented).
> 
> I'm against porting this patch to 4.1.x, as it is a risky change (which
> will need lots of testing) with only a few cases where it would improve
> performance in the real world (InputStream performance is almost
> equivalent, and it is used a lot more often than the BufferedReader, as
> the integrated FORM POST parsing uses it).
> 
> Remy




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


[5.0] Input optimization

Posted by Remy Maucherat <re...@apache.org>.
I've committed input optimization similar to the OutputBuffer used in 
the Coyote adapter. It appears to work ok (although 
BufferedWriter.readLine still needs to be implemented).

I'm against porting this patch to 4.1.x, as it is a risky change (which 
will need lots of testing) with only a few cases where it would improve 
performance in the real world (InputStream performance is almost 
equivalent, and it is used a lot more often than the BufferedReader, as 
the integrated FORM POST parsing uses it).

Remy


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Christoph Seibert <se...@cs.uni-bonn.de>.
Am Sonntag, 05.01.03 um 02:15 Uhr schrieb Roberto Casanova:
> You should not revert completely to revision 1.32.  There were two
> changes done to StandardServer.java in your commit of revision 1.33.
>
> We discussed only the first change (in method convertStr around line
> 824) and I think we agree it should be reverted.
>
> But the second change done in that same commit actually fixes the
> original problem (bug 15762) and should be preserved.

I agree. I simply forgot to point that out in my last post.

In discussing this bug, and looking at bug 15798 (which is
Windows-specific, I guess, but nevertheless concerns a similiar
issue), I think that the way the XML files are written 'by hand'
through PrintWriters is prone to produce bugs of this kind, because
it is easy to forget that some strings must be encoded. Isn't there
some standard API for _writing_ XML, which takes care of these
encoding issues transparently?

I thought about maybe having a look at the Cocoon project's
Serializers, which I think do something like this via SAX events.
Of course, one could also construct a DOM tree and write that out,
but I don't know whether this is a good idea in terms of performance.
Also, I don't know if encoding issues are taken care of in each
approach.

Ciao,
Christoph

-- 
--- Christoph Seibert                   seibert@cs.uni-bonn.de ---
-- Farlon Dragon -==(UDIC)==-    http://home.pages.de/~seibert/ --
- Who can possibly rule if no one                                -
-         who wants to can be allowed to?     - D. Adams, HHGTTG -


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Roberto Casanova <rc...@bluewin.ch>.
You should not revert completely to revision 1.32.  There were two
changes done to StandardServer.java in your commit of revision 1.33.

We discussed only the first change (in method convertStr around line
824) and I think we agree it should be reverted.

But the second change done in that same commit actually fixes the
original problem (bug 15762) and should be preserved.  This is in method
storeNamingResources around line 1822:

  @@ -1822,7 +1830,7 @@
                       writer.print(' ');
                   }
                   writer.print("<value>");
  -                writer.print(value);
  +                writer.print(convertStr(value));
                   writer.println("</value>");
                   for (int j = 0; j < indent + 2; j++) {
                       writer.print(' ');

Actually, I have been running Tomcat with this same fix for a month or
so, and it works well.  You can enter special characters (like &) in the
data source url using the admin webapp (no need to escape them), they
get stored in server.xml and read back properly.  (I had attached that
patch with the original bug report)

Thanks
Roberto

> From: Amy Roh
> Sent: Sunday, January 05, 2003 0:46
> 
> Roberto Casanova wrote:
> > I see another problem with this code.
> > 
> > Suppose for some reason we have an attribute or resource 
> parameter value
> > like the following (without the quotes):
> > "&gt; corresponds to >"
> > The correct XML for this string is:
> > "&amp;gt; corresponds to &gt;"
> > However this code would write to server.xml:
> > "&gt; corresponds to &gt;"
> > The next time the server.xml file is read in, we end up with:
> > "> corresponds to >"
> > which is different than the original string.
> > 
> > In my opinion this portion of the code should be left as it was in
> > revision 1.32:
> 
> I see the problem with the previous commit - Sorry, I should have 
> thought about it more before making the quick change.  However, the 
> original problem of second time admin saving url in invalid 
> form still 
> occurs with revision 1.32.  The workaround is to make sure url is in 
> valid form in datasource page everytime you commit any changes via 
> admin.  Is this workaround feasible?
> 
> Amy
> 
> > 
> > Roberto
> > 
> > 
> >>Christoph Seibert wrote:
> >>
> >>>Hi there,
> >>>
> >>>I think there is a problem with the following fix:
> >>>
> >>>
> >>>>amyroh      2003/01/02 17:59:09
> >>>>
> >>>>  Modified:    catalina/src/share/org/apache/catalina/core
> >>>>                        StandardServer.java
> >>>>  Log:
> >>>>  Fix for bugzilla 15762.
> >>>
> >>>[...]
> >>>
> >>>
> >>>>  diff -u -r1.32 -r1.33
> >>>>  --- StandardServer.java    11 Sep 2002 14:19:33 -0000    1.32
> >>>>  +++ StandardServer.java    3 Jan 2003 01:59:08 -0000    1.33
> >>>>  @@ -824,7 +824,15 @@
> >>>>               } else if (c == '"') {
> >>>>                   filtered.append("&quot;");
> >>>>               } else if (c == '&') {
> >>>>  -                filtered.append("&amp;");
> >>>>  +                char s1 = input.charAt(i+3);
> >>>>  +                char s2 = input.charAt(i+4);
> >>>>  +                char s3 = input.charAt(i+5);
> >>>>  +                if (((s1 == ';') || (s2 == ';')) || (s3 
> >>>
> >>== ';')) {
> >>
> >>>>  +                    // do not convert if it's already 
> >>>
> >>in converted 
> >>
> >>>>form
> >>>>  +                    filtered.append(c);
> >>>>  +                } else {
> >>>>  +                    filtered.append("&amp;");
> >>>>  +                }
> >>>>               } else {
> >>>>                   filtered.append(c);
> >>>>               }
> >>>
> >>>
> >>>(Note: I haven't had a look at the surrounding code yet, so 
> >>
> >>I have to 
> >>
> >>>assume that 'i' is the position of 'c', that is the '&' character.)
> >>>
> >>>This code assumes that character or entity references will not be 
> >>>shorter than 4 characters (including the delimiters '&' and 
> >>
> >>';') and 
> >>
> >>>no longer than 6. However, the XML specification does not 
> >>
> >>in any way 
> >>
> >>>define restrictions like that. For example, '&d;' is a 
> valid entity 
> >>>reference (assuming it was defined in the DTD). Worse, 
> character or 
> >>>entity references can have arbitrary length. For example, 
> >>>'&#x0000000000020' is a valid character reference to the ' 
> >>
> >>' (space) 
> >>
> >>>character.
> >>>
> >>>I'm sorry I don't have a better fix right now, but I assume 
> >>
> >>one would 
> >>
> >>>have to iterate through the characters following the '&' 
> >>
> >>until either 
> >>
> >>>a ';' is found or a character occurs that is not a legal 
> part of an 
> >>>entity reference name (or in the case of a character 
> reference, not 
> >>>one of [0-9] for decimal or [0-9a-fA-F] for hexadecimal).
> >>>
> >>>(Actually, I believe this wheel must already have been 
> >>
> >>invented, but 
> >>
> >>>with only looking at this code snippet, I don't really know.)
> >>
> >>I believe iterating through the characters following the '&' 
> >>to look for 
> >>';' is found will fix the problem.  A character such as 
> >>'&#x0000000000020' without following ';' will result in 
> parsing error 
> >>where as '&#x0000000000020;' will be written as a space(' ').
> >>
> >>Thanks,
> >>Amy
> >>
> >>
> >>>Ciao,
> >>>Christoph


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Amy Roh <am...@apache.org>.
Roberto Casanova wrote:
> I see another problem with this code.
> 
> Suppose for some reason we have an attribute or resource parameter value
> like the following (without the quotes):
> "&gt; corresponds to >"
> The correct XML for this string is:
> "&amp;gt; corresponds to &gt;"
> However this code would write to server.xml:
> "&gt; corresponds to &gt;"
> The next time the server.xml file is read in, we end up with:
> "> corresponds to >"
> which is different than the original string.
> 
> In my opinion this portion of the code should be left as it was in
> revision 1.32:

I see the problem with the previous commit - Sorry, I should have 
thought about it more before making the quick change.  However, the 
original problem of second time admin saving url in invalid form still 
occurs with revision 1.32.  The workaround is to make sure url is in 
valid form in datasource page everytime you commit any changes via 
admin.  Is this workaround feasible?

Amy

> 
> Roberto
> 
> 
>>-----Original Message-----
>>From: Amy Roh [mailto:amyroh@apache.org] 
>>Sent: Friday, January 03, 2003 20:55
>>To: Tomcat Developers List
>>Subject: Re: cvs commit: 
>>jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core
>> StandardServer.java
>>
>>
>>Christoph Seibert wrote:
>>
>>>Hi there,
>>>
>>>I think there is a problem with the following fix:
>>>
>>>
>>>>amyroh      2003/01/02 17:59:09
>>>>
>>>>  Modified:    catalina/src/share/org/apache/catalina/core
>>>>                        StandardServer.java
>>>>  Log:
>>>>  Fix for bugzilla 15762.
>>>
>>>[...]
>>>
>>>
>>>>  diff -u -r1.32 -r1.33
>>>>  --- StandardServer.java    11 Sep 2002 14:19:33 -0000    1.32
>>>>  +++ StandardServer.java    3 Jan 2003 01:59:08 -0000    1.33
>>>>  @@ -824,7 +824,15 @@
>>>>               } else if (c == '"') {
>>>>                   filtered.append("&quot;");
>>>>               } else if (c == '&') {
>>>>  -                filtered.append("&amp;");
>>>>  +                char s1 = input.charAt(i+3);
>>>>  +                char s2 = input.charAt(i+4);
>>>>  +                char s3 = input.charAt(i+5);
>>>>  +                if (((s1 == ';') || (s2 == ';')) || (s3 
>>>
>>== ';')) {
>>
>>>>  +                    // do not convert if it's already 
>>>
>>in converted 
>>
>>>>form
>>>>  +                    filtered.append(c);
>>>>  +                } else {
>>>>  +                    filtered.append("&amp;");
>>>>  +                }
>>>>               } else {
>>>>                   filtered.append(c);
>>>>               }
>>>
>>>
>>>(Note: I haven't had a look at the surrounding code yet, so 
>>
>>I have to 
>>
>>>assume that 'i' is the position of 'c', that is the '&' character.)
>>>
>>>This code assumes that character or entity references will not be 
>>>shorter than 4 characters (including the delimiters '&' and 
>>
>>';') and 
>>
>>>no longer than 6. However, the XML specification does not 
>>
>>in any way 
>>
>>>define restrictions like that. For example, '&d;' is a valid entity 
>>>reference (assuming it was defined in the DTD). Worse, character or 
>>>entity references can have arbitrary length. For example, 
>>>'&#x0000000000020' is a valid character reference to the ' 
>>
>>' (space) 
>>
>>>character.
>>>
>>>I'm sorry I don't have a better fix right now, but I assume 
>>
>>one would 
>>
>>>have to iterate through the characters following the '&' 
>>
>>until either 
>>
>>>a ';' is found or a character occurs that is not a legal part of an 
>>>entity reference name (or in the case of a character reference, not 
>>>one of [0-9] for decimal or [0-9a-fA-F] for hexadecimal).
>>>
>>>(Actually, I believe this wheel must already have been 
>>
>>invented, but 
>>
>>>with only looking at this code snippet, I don't really know.)
>>
>>I believe iterating through the characters following the '&' 
>>to look for 
>>';' is found will fix the problem.  A character such as 
>>'&#x0000000000020' without following ';' will result in parsing error 
>>where as '&#x0000000000020;' will be written as a space(' ').
>>
>>Thanks,
>>Amy
>>
>>
>>>Ciao,
>>>Christoph
>>>
>>
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Christoph Seibert <se...@cs.uni-bonn.de>.
Am Freitag, 03.01.03 um 23:48 Uhr schrieb Roberto Casanova:
> I see another problem with this code.
>
> Suppose for some reason we have an attribute or resource parameter 
> value
> like the following (without the quotes):
> "&gt; corresponds to >"
> The correct XML for this string is:
> "&amp;gt; corresponds to &gt;"
> However this code would write to server.xml:
> "&gt; corresponds to &gt;"
> The next time the server.xml file is read in, we end up with:
> "> corresponds to >"
> which is different than the original string.
>
> In my opinion this portion of the code should be left as it was in
> revision 1.32:

Actually, after reading the code in context (that is, I've had a
look at StandardServer.java), I agree with this. The change to
convertStr() results in inconsistent handling of input strings.

The question I've been asking myself is: Why should convertStr()
treat the input string as if it was a mixture of unescaped and
already escaped <,>,&,' and " characters? Since I still don't
have the full context, I don't know where the input string comes
from, so I can't really answer that. If the input string comes
from a form, it should be treated as in revision 1.32, because
of what Roberto points out. If it comes from an XML file, no
conversion should be necessary, because the XML parser checks
for well-formedness of the input file - unless the parser resolves
the entity and character references before passing the string, in
which case the conversion becomes necessary again. (Wow, I hope
this doesn't sound like complete drivel... ;-))

Ciao,
Christoph

-- 
--- Christoph Seibert                   seibert@cs.uni-bonn.de ---
-- Farlon Dragon -==(UDIC)==-    http://home.pages.de/~seibert/ --
- Who can possibly rule if no one                                -
-         who wants to can be allowed to?     - D. Adams, HHGTTG -


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Roberto Casanova <rc...@bluewin.ch>.
I see another problem with this code.

Suppose for some reason we have an attribute or resource parameter value
like the following (without the quotes):
"&gt; corresponds to >"
The correct XML for this string is:
"&amp;gt; corresponds to &gt;"
However this code would write to server.xml:
"&gt; corresponds to &gt;"
The next time the server.xml file is read in, we end up with:
"> corresponds to >"
which is different than the original string.

In my opinion this portion of the code should be left as it was in
revision 1.32:

Roberto

> -----Original Message-----
> From: Amy Roh [mailto:amyroh@apache.org] 
> Sent: Friday, January 03, 2003 20:55
> To: Tomcat Developers List
> Subject: Re: cvs commit: 
> jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core
>  StandardServer.java
> 
> 
> Christoph Seibert wrote:
> > Hi there,
> > 
> > I think there is a problem with the following fix:
> > 
> >> amyroh      2003/01/02 17:59:09
> >>
> >>   Modified:    catalina/src/share/org/apache/catalina/core
> >>                         StandardServer.java
> >>   Log:
> >>   Fix for bugzilla 15762.
> > 
> > [...]
> > 
> >>   diff -u -r1.32 -r1.33
> >>   --- StandardServer.java    11 Sep 2002 14:19:33 -0000    1.32
> >>   +++ StandardServer.java    3 Jan 2003 01:59:08 -0000    1.33
> >>   @@ -824,7 +824,15 @@
> >>                } else if (c == '"') {
> >>                    filtered.append("&quot;");
> >>                } else if (c == '&') {
> >>   -                filtered.append("&amp;");
> >>   +                char s1 = input.charAt(i+3);
> >>   +                char s2 = input.charAt(i+4);
> >>   +                char s3 = input.charAt(i+5);
> >>   +                if (((s1 == ';') || (s2 == ';')) || (s3 
> == ';')) {
> >>   +                    // do not convert if it's already 
> in converted 
> >> form
> >>   +                    filtered.append(c);
> >>   +                } else {
> >>   +                    filtered.append("&amp;");
> >>   +                }
> >>                } else {
> >>                    filtered.append(c);
> >>                }
> > 
> > 
> > (Note: I haven't had a look at the surrounding code yet, so 
> I have to 
> > assume that 'i' is the position of 'c', that is the '&' character.)
> > 
> > This code assumes that character or entity references will not be 
> > shorter than 4 characters (including the delimiters '&' and 
> ';') and 
> > no longer than 6. However, the XML specification does not 
> in any way 
> > define restrictions like that. For example, '&d;' is a valid entity 
> > reference (assuming it was defined in the DTD). Worse, character or 
> > entity references can have arbitrary length. For example, 
> > '&#x0000000000020' is a valid character reference to the ' 
> ' (space) 
> > character.
> > 
> > I'm sorry I don't have a better fix right now, but I assume 
> one would 
> > have to iterate through the characters following the '&' 
> until either 
> > a ';' is found or a character occurs that is not a legal part of an 
> > entity reference name (or in the case of a character reference, not 
> > one of [0-9] for decimal or [0-9a-fA-F] for hexadecimal).
> > 
> > (Actually, I believe this wheel must already have been 
> invented, but 
> > with only looking at this code snippet, I don't really know.)
> 
> I believe iterating through the characters following the '&' 
> to look for 
> ';' is found will fix the problem.  A character such as 
> '&#x0000000000020' without following ';' will result in parsing error 
> where as '&#x0000000000020;' will be written as a space(' ').
> 
> Thanks,
> Amy
> 
> > 
> > Ciao,
> > Christoph
> > 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Christoph Seibert <se...@cs.uni-bonn.de>.
Am Freitag, 03.01.03 um 20:55 Uhr schrieb Amy Roh:
> Christoph Seibert wrote:
>>>   Fix for bugzilla 15762.
>> I'm sorry I don't have a better fix right now, but I assume one
>> would have to iterate through the characters following the '&'
>> until either a ';' is found or a character occurs that is not a legal
>> part of an entity reference name (or in the case of a character
>> reference, not one of [0-9] for decimal or [0-9a-fA-F] for
>> hexadecimal).
> I believe iterating through the characters following the '&' to look 
> for ';' is found will fix the problem.  A character such as 
> '&#x0000000000020' without following ';' will result in parsing error 
> where as '&#x0000000000020;' will be written as a space(' ').

I'm sorry (really - I'm new here and already I start correcting
other people's code without having contributed any myself), but
I don't think this is sufficient. On encountering a string like

'I like to spell & as &amp;'

your solution would treat '& as &amp;' as a valid entity
reference, and would not escape the first '&' character.

However, please also see my answer to Roberto's mail before
making another change.

Ciao,
Christoph

-- 
--- Christoph Seibert                   seibert@cs.uni-bonn.de ---
-- Farlon Dragon -==(UDIC)==-    http://home.pages.de/~seibert/ --
- Who can possibly rule if no one                                -
-         who wants to can be allowed to?     - D. Adams, HHGTTG -


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Amy Roh <am...@apache.org>.
Christoph Seibert wrote:
> Hi there,
> 
> I think there is a problem with the following fix:
> 
>> amyroh      2003/01/02 17:59:09
>>
>>   Modified:    catalina/src/share/org/apache/catalina/core
>>                         StandardServer.java
>>   Log:
>>   Fix for bugzilla 15762.
> 
> [...]
> 
>>   diff -u -r1.32 -r1.33
>>   --- StandardServer.java    11 Sep 2002 14:19:33 -0000    1.32
>>   +++ StandardServer.java    3 Jan 2003 01:59:08 -0000    1.33
>>   @@ -824,7 +824,15 @@
>>                } else if (c == '"') {
>>                    filtered.append("&quot;");
>>                } else if (c == '&') {
>>   -                filtered.append("&amp;");
>>   +                char s1 = input.charAt(i+3);
>>   +                char s2 = input.charAt(i+4);
>>   +                char s3 = input.charAt(i+5);
>>   +                if (((s1 == ';') || (s2 == ';')) || (s3 == ';')) {
>>   +                    // do not convert if it's already in converted 
>> form
>>   +                    filtered.append(c);
>>   +                } else {
>>   +                    filtered.append("&amp;");
>>   +                }
>>                } else {
>>                    filtered.append(c);
>>                }
> 
> 
> (Note: I haven't had a look at the surrounding code yet, so I have to
> assume that 'i' is the position of 'c', that is the '&' character.)
> 
> This code assumes that character or entity references will not be
> shorter than 4 characters (including the delimiters '&' and ';')
> and no longer than 6. However, the XML specification does not in
> any way define restrictions like that. For example, '&d;' is a
> valid entity reference (assuming it was defined in the DTD). Worse,
> character or entity references can have arbitrary length. For example,
> '&#x0000000000020' is a valid character reference to the ' ' (space)
> character.
> 
> I'm sorry I don't have a better fix right now, but I assume one
> would have to iterate through the characters following the '&'
> until either a ';' is found or a character occurs that is not a legal
> part of an entity reference name (or in the case of a character
> reference, not one of [0-9] for decimal or [0-9a-fA-F] for
> hexadecimal).
> 
> (Actually, I believe this wheel must already have been invented,
> but with only looking at this code snippet, I don't really know.)

I believe iterating through the characters following the '&' to look for 
';' is found will fix the problem.  A character such as 
'&#x0000000000020' without following ';' will result in parsing error 
where as '&#x0000000000020;' will be written as a space(' ').

Thanks,
Amy

> 
> Ciao,
> Christoph
> 




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: cvs commit: jakarta-tomcat-4.0/catalina/src/share/org/apache/catalina/core StandardServer.java

Posted by Christoph Seibert <se...@cs.uni-bonn.de>.
Hi there,

I think there is a problem with the following fix:

> amyroh      2003/01/02 17:59:09
>
>   Modified:    catalina/src/share/org/apache/catalina/core
>                         StandardServer.java
>   Log:
>   Fix for bugzilla 15762.
[...]
>   diff -u -r1.32 -r1.33
>   --- StandardServer.java	11 Sep 2002 14:19:33 -0000	1.32
>   +++ StandardServer.java	3 Jan 2003 01:59:08 -0000	1.33
>   @@ -824,7 +824,15 @@
>                } else if (c == '"') {
>                    filtered.append("&quot;");
>                } else if (c == '&') {
>   -                filtered.append("&amp;");
>   +                char s1 = input.charAt(i+3);
>   +                char s2 = input.charAt(i+4);
>   +                char s3 = input.charAt(i+5);
>   +                if (((s1 == ';') || (s2 == ';')) || (s3 == ';')) {
>   +                    // do not convert if it's already in converted 
> form
>   +                    filtered.append(c);
>   +                } else {
>   +                    filtered.append("&amp;");
>   +                }
>                } else {
>                    filtered.append(c);
>                }

(Note: I haven't had a look at the surrounding code yet, so I have to
assume that 'i' is the position of 'c', that is the '&' character.)

This code assumes that character or entity references will not be
shorter than 4 characters (including the delimiters '&' and ';')
and no longer than 6. However, the XML specification does not in
any way define restrictions like that. For example, '&d;' is a
valid entity reference (assuming it was defined in the DTD). Worse,
character or entity references can have arbitrary length. For example,
'&#x0000000000020' is a valid character reference to the ' ' (space)
character.

I'm sorry I don't have a better fix right now, but I assume one
would have to iterate through the characters following the '&'
until either a ';' is found or a character occurs that is not a legal
part of an entity reference name (or in the case of a character
reference, not one of [0-9] for decimal or [0-9a-fA-F] for
hexadecimal).

(Actually, I believe this wheel must already have been invented,
but with only looking at this code snippet, I don't really know.)

Ciao,
Christoph

-- 
--- Christoph Seibert                   seibert@cs.uni-bonn.de ---
-- Farlon Dragon -==(UDIC)==-    http://home.pages.de/~seibert/ --
- Who can possibly rule if no one                                -
-         who wants to can be allowed to?     - D. Adams, HHGTTG -


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>