You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2005/05/27 10:59:21 UTC

DO NOT REPLY [Bug 35100] New: - URL-parsing does not work for www.altavista.com

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35100>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35100

           Summary: URL-parsing does not work for www.altavista.com
           Product: Apache httpd-2.0
           Version: 2.0.54
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_proxy
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: bjoern@cs.tu-berlin.de


It's not possible to use the relatively popular search engine

   http://www.altavista.com/

with apache2's mod_proxy* modules.

You can easily see the problem, if you

  a) type a search word into the search field in 
     http://www.altavista.com/

  b) click on of the links in this page

The main problem is, that apache-mod_proxy does some URL re-encodings. After
this re-encodings the original URL path component differs from the encoded form.

An example. There is an example link from http://de.altavista.com/ (I
changed it a little bit, because I do not know, if the URL contains
private infos)
  
http://av.rds.yahoo.com/_ylt=A9ibyDZZCEq4AklmSLaMX;_ylu=X3oDBvNjNnZmYzBHBndANhdl93ZWJfaG9tZQRzZWMDdGFicw--/SIG=11nr22kc/EXP=111216420/**http%3a//de.altavista.com/dir/default

apache-mod_proxy transforms it to (sniffed with ethereal):

   GET
/_ylt=A9ibyDZZCEq4AklmSLaMX;_ylu=X3oDBvNjNnZmYzBHBndANhdl93ZWJfaG9tZQRzZWMDdGFicw--/SIG=11nr22kc/EXP=111216420/**http://de.altavista.com/dir/default
   HTTP/1.1

Do you see the difference? "http%3a//" is transformed to "http://". 

The offline browser wwwoffle has the same problem. I wrote a patch for wwwoffle,
which makes saves "%3a" in URL pathes, instead of rewriting it to the colon (":"). 

I'm not familiar with apache2's mod_proxy* code. But probably the idea of saving
"%3a" also helps to fix the problem in apache2.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org