You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by Rick Flosi <rf...@imagescape.com> on 2006/11/17 17:09:31 UTC
Nutch Mishandling space character in URL
Nutch spidered one of our sites last night and when it encountered a URL
that contained a space character it would ignore everything after the
space which caused our application to fail with the resulting URL it
attempted to access.
Example URL that should have been requested:
http://www.apache.org/cgi-bin/view?status=A%20&id=1
What Nutch then tried to access:
http://www.apache.org/cgi-bin/view?status=A
Please investigate.
Thanks,
Rick Flosi