You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by Rick Flosi <rf...@imagescape.com> on 2006/11/17 17:09:31 UTC

Nutch Mishandling space character in URL

Nutch spidered one of our sites last night and when it encountered a URL 
that contained a space character it would ignore everything after the 
space which caused our application to fail with the resulting URL it 
attempted to access.

Example URL that should have been requested:
   http://www.apache.org/cgi-bin/view?status=A%20&id=1

What Nutch then tried to access:
   http://www.apache.org/cgi-bin/view?status=A

Please investigate.

Thanks,
Rick Flosi