You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2013/05/08 20:13:10 UTC

Re: normalize gives malformed url exception

Hi Alex,
normalizers.normalize takes parameters (String urlString, String scope)
In the code you provide is the url object of type String? Should it not be
imgUrl being passed as the first argument?
Also can you put it in to more context as to why you are using the
URLNormalizers.SCOPE_FETCHER scope?
Thanks
Lewis


On Tue, May 7, 2013 at 12:41 PM, <al...@aim.com> wrote:

> Hello,
>
> I use nutch-1.6 and the following code
>
> try{
>                 url = new URL(base, url);
>                  imgUrl =url.toString();
>                  // Normalize and Replace spaces with %20
>                  url = url.replaceAll("\\s", "%20");
>                  url =
> normalizers.normalize(url,URLNormalizers.SCOPE_FETCHER);
>               }
>               catch (MalformedURLException mue){
>                LOG.info("MalformedURL: " + url);
>
>               }
>
> catches malformed exception for
> urls
>
> http://mysite.com/img/banners/ads-2_50.jpg
> http://mysite.com/img/writers/DWC_7996.JPG
>
> Any ideas what might be wrong?
>
> Thanks.
> Alex.
>



-- 
*Lewis*