You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by bu...@apache.org on 2001/09/22 22:41:04 UTC
DO NOT REPLY [Bug 3777] New: -
URL's with query string not parsed correctly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3777>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3777
URL's with query string not parsed correctly
Summary: URL's with query string not parsed correctly
Product: ORO
Version: 2.0.4
Platform: All
OS/Version: All
Status: NEW
Severity: Critical
Priority: Other
Component: Main
AssignedTo: oro-dev@jakarta.apache.org
ReportedBy: cchaman@hotmail.com
I used a regex to parse URLs in HTML, as presented in the Perl Cookbook and
made the appropriate changes (i.e correct escaping) for the RE to work in ORO.
I am able to parse an HTML file for ALL URLs, but those with a question mark in
them are skipped.
Tried a number of tests and it seems like when a question mark (?) is
encountered, it somehow becomes part of the RE which causes the RE not to match
properly.
Example: We put the following HTML in the file and did a search an replace on
URL.
http://www.a.com/a.htm --> to be replace with www.aaa.com/aaa.htm
http://www.b.com/? --> to be replaced with www.bbb.com/b
http://www.c.com/c?abc --> to br replace with www.ccc.com/ccc?abc
After we ran the program got the following result
www.a.com/a.htm changed to www.aaa.com/aaa.htm
www.b.com/? changed to www.bbb.com/b? - NOTO THE TRAILING ?
www.c.com/c?abc was not replaced !!!
PLEAS HELP!!!