You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by feng lu <am...@gmail.com> on 2013/01/22 06:59:15 UTC

CrawlDbFilter urlNormalizers NULL pointer

Hi all

In map method of CrawlDbFilter class, if url == null and urlNormalizers is
true, may be it will throw NullPointerExceptions .

if (urlNormalizers) {
      try {
        url = normalizers.normalize(url, scope); // normalize the url
      } catch (Exception e) {
        LOG.warn("Skipping " + url + ":" + e);
        url = null;
      }
    }
    if (url != null && urlFiltering) {
      try {
        url = filters.filter(url); // filter the url
      } catch (Exception e) {
        LOG.warn("Skipping " + url + ":" + e);
        url = null;
      }
    }

May be we can check the url null value before urlNormalizers.

if ( url != null && urlNormalizers) {
    ....
    }

-- 
Don't Grow Old, Grow Up... :-)

Re: CrawlDbFilter urlNormalizers NULL pointer

Posted by feng lu <am...@gmail.com>.
ok. i will add a issue and a trivial test case later.

thanks Lewis


On Tue, Jan 22, 2013 at 2:09 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> This looks like a good catch.
> Please open a ticket for it if you can. A trivial test case would also be
> great if you are able.
> Lewis
>
>
> On Monday, January 21, 2013, feng lu <am...@gmail.com> wrote:
> > Hi all
> > In map method of CrawlDbFilter class, if url == null and urlNormalizers
> is true, may be it will throw NullPointerExceptions .
> > if (urlNormalizers) {
> >       try {
> >         url = normalizers.normalize(url, scope); // normalize the url
> >       } catch (Exception e) {
> >         LOG.warn("Skipping " + url + ":" + e);
> >         url = null;
> >       }
> >     }
> >     if (url != null && urlFiltering) {
> >       try {
> >         url = filters.filter(url); // filter the url
> >       } catch (Exception e) {
> >         LOG.warn("Skipping " + url + ":" + e);
> >         url = null;
> >       }
> >     }
> > May be we can check the url null value before urlNormalizers.
> > if ( url != null && urlNormalizers) {
> >     ....
> >     }
> > --
> > Don't Grow Old, Grow Up... :-)
>
> --
> *Lewis*
>
>


-- 
Don't Grow Old, Grow Up... :-)

Re: CrawlDbFilter urlNormalizers NULL pointer

Posted by Lewis John Mcgibbney <le...@gmail.com>.
This looks like a good catch.
Please open a ticket for it if you can. A trivial test case would also be
great if you are able.
Lewis

On Monday, January 21, 2013, feng lu <am...@gmail.com> wrote:
> Hi all
> In map method of CrawlDbFilter class, if url == null and urlNormalizers
is true, may be it will throw NullPointerExceptions .
> if (urlNormalizers) {
>       try {
>         url = normalizers.normalize(url, scope); // normalize the url
>       } catch (Exception e) {
>         LOG.warn("Skipping " + url + ":" + e);
>         url = null;
>       }
>     }
>     if (url != null && urlFiltering) {
>       try {
>         url = filters.filter(url); // filter the url
>       } catch (Exception e) {
>         LOG.warn("Skipping " + url + ":" + e);
>         url = null;
>       }
>     }
> May be we can check the url null value before urlNormalizers.
> if ( url != null && urlNormalizers) {
>     ....
>     }
> --
> Don't Grow Old, Grow Up... :-)

-- 
*Lewis*