You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2013/05/07 21:41:12 UTC
normalize gives malformed url exception
Hello,
I use nutch-1.6 and the following code
try{
url = new URL(base, url);
imgUrl =url.toString();
// Normalize and Replace spaces with %20
url = url.replaceAll("\\s", "%20");
url = normalizers.normalize(url,URLNormalizers.SCOPE_FETCHER);
}
catch (MalformedURLException mue){
LOG.info("MalformedURL: " + url);
}
catches malformed exception for
urls
http://mysite.com/img/banners/ads-2_50.jpg
http://mysite.com/img/writers/DWC_7996.JPG
Any ideas what might be wrong?
Thanks.
Alex.
Re: normalize gives malformed url exception
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,
normalizers.normalize takes parameters (String urlString, String scope)
In the code you provide is the url object of type String? Should it not be
imgUrl being passed as the first argument?
Also can you put it in to more context as to why you are using the
URLNormalizers.SCOPE_FETCHER scope?
Thanks
Lewis
On Tue, May 7, 2013 at 12:41 PM, <al...@aim.com> wrote:
> Hello,
>
> I use nutch-1.6 and the following code
>
> try{
> url = new URL(base, url);
> imgUrl =url.toString();
> // Normalize and Replace spaces with %20
> url = url.replaceAll("\\s", "%20");
> url =
> normalizers.normalize(url,URLNormalizers.SCOPE_FETCHER);
> }
> catch (MalformedURLException mue){
> LOG.info("MalformedURL: " + url);
>
> }
>
> catches malformed exception for
> urls
>
> http://mysite.com/img/banners/ads-2_50.jpg
> http://mysite.com/img/writers/DWC_7996.JPG
>
> Any ideas what might be wrong?
>
> Thanks.
> Alex.
>
--
*Lewis*