You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2013/05/07 21:41:12 UTC

normalize gives malformed url exception

Hello,

I use nutch-1.6 and the following code

try{
                url = new URL(base, url);
                 imgUrl =url.toString();
                 // Normalize and Replace spaces with %20
                 url = url.replaceAll("\\s", "%20");
                 url = normalizers.normalize(url,URLNormalizers.SCOPE_FETCHER);
              }
              catch (MalformedURLException mue){
               LOG.info("MalformedURL: " + url);
               
              }

catches malformed exception for 
urls

http://mysite.com/img/banners/ads-2_50.jpg
http://mysite.com/img/writers/DWC_7996.JPG

Any ideas what might be wrong?

Thanks.
Alex.

Re: normalize gives malformed url exception

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,
normalizers.normalize takes parameters (String urlString, String scope)
In the code you provide is the url object of type String? Should it not be
imgUrl being passed as the first argument?
Also can you put it in to more context as to why you are using the
URLNormalizers.SCOPE_FETCHER scope?
Thanks
Lewis


On Tue, May 7, 2013 at 12:41 PM, <al...@aim.com> wrote:

> Hello,
>
> I use nutch-1.6 and the following code
>
> try{
>                 url = new URL(base, url);
>                  imgUrl =url.toString();
>                  // Normalize and Replace spaces with %20
>                  url = url.replaceAll("\\s", "%20");
>                  url =
> normalizers.normalize(url,URLNormalizers.SCOPE_FETCHER);
>               }
>               catch (MalformedURLException mue){
>                LOG.info("MalformedURL: " + url);
>
>               }
>
> catches malformed exception for
> urls
>
> http://mysite.com/img/banners/ads-2_50.jpg
> http://mysite.com/img/writers/DWC_7996.JPG
>
> Any ideas what might be wrong?
>
> Thanks.
> Alex.
>



-- 
*Lewis*