You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by bu...@apache.org on 2001/09/22 22:41:04 UTC

DO NOT REPLY [Bug 3777] New: - URL's with query string not parsed correctly

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3777>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3777

URL's with query string not parsed correctly

           Summary: URL's with query string not parsed correctly
           Product: ORO
           Version: 2.0.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Critical
          Priority: Other
         Component: Main
        AssignedTo: oro-dev@jakarta.apache.org
        ReportedBy: cchaman@hotmail.com


I used a regex to parse URLs in HTML, as presented in the Perl Cookbook and 
made the appropriate changes (i.e correct escaping) for the RE to work in ORO. 
I am able to parse an HTML file for ALL URLs, but those with a question mark in 
them are skipped. 

Tried a number of tests and it seems like when a question mark (?) is 
encountered, it somehow becomes part of the RE which causes the RE not to match 
properly.

Example: We put the following HTML in the file and did a search an replace on 
URL. 

http://www.a.com/a.htm --> to be replace with www.aaa.com/aaa.htm
http://www.b.com/? --> to be replaced with www.bbb.com/b
http://www.c.com/c?abc --> to br replace with www.ccc.com/ccc?abc

After we ran the program got the following result

www.a.com/a.htm changed to www.aaa.com/aaa.htm
www.b.com/? changed  to www.bbb.com/b? - NOTO THE TRAILING ?
www.c.com/c?abc was not replaced !!!

PLEAS HELP!!!