You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Wolfgang Woerndl <wo...@informatik.tu-muenchen.de> on 2007/10/04 15:42:12 UTC

NullPointerException when tying to init NutchBean

Hello,

I installed Nutch 0.8.1., crawled some Web pages and get (meaningful) results 
when calling
    bin/nutch org.apache.nutch.searcher.NutchBean test
from the Nutch directory.

Now, I'm trying to integrate searching in a larger Servlet application:

public String fulltextSearch (String querystring, int maxhits)
{
    Configuration conf = NutchConfiguration.create();
    NutchBean bean = new NutchBean(conf, new Path("/path/to/nutch/"));
    Query query = Query.parse(querystring, conf);
    ...

It compiles but I get a NullPointerException at runtime:

----------
      [exec] java.lang.NullPointerException
      [exec]     at java.io.Reader.<init>(Reader.java:61)
      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:76)
      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:91)
      [exec]     at org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:151)
      [exec]     at 
org.apache.nutch.analysis.CommonGrams.<init>(CommonGrams.java:51)
      [exec]     at 
org.apache.nutch.searcher.FieldQueryFilter.setConf(FieldQueryFilter.java:107)
      [exec]     at 
org.apache.nutch.searcher.url.URLQueryFilter.setConf(URLQueryFilter.java:33)
      [exec]     at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153)
      [exec]     at 
org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:75)
      [exec]     at 
org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:78)
      [exec]     at 
org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:62)
      [exec]     at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:139)
      [exec]     at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
----------

conf.toString() gives:
"Configuration: defaults: hadoop-default.xml , nutch-default.xmlfinal: 
hadoop-site.xml , nutch-site.xml"

I assume that Nutch can't find its configuration files. That's why I'm try to 
tell it the correct path when calling new NutchBean(...)

Does anybody know what my problem might be, or even a possible solution? :)

Thanks in advance.

Wolfgang

Re: NullPointerException when tying to init NutchBean

Posted by Dennis Kubes <ku...@apache.org>.
My guess is seeing your error below is that you didn't move over the 
common-terms.utf8 or other needed files from the nutch conf directory 
into the classpath of your web application.

Dennis Kubes

Wolfgang Woerndl wrote:
> Hello,
> 
> I installed Nutch 0.8.1., crawled some Web pages and get (meaningful) 
> results when calling
>    bin/nutch org.apache.nutch.searcher.NutchBean test
> from the Nutch directory.
> 
> Now, I'm trying to integrate searching in a larger Servlet application:
> 
> public String fulltextSearch (String querystring, int maxhits)
> {
>    Configuration conf = NutchConfiguration.create();
>    NutchBean bean = new NutchBean(conf, new Path("/path/to/nutch/"));
>    Query query = Query.parse(querystring, conf);
>    ...
> 
> It compiles but I get a NullPointerException at runtime:
> 
> ----------
>      [exec] java.lang.NullPointerException
>      [exec]     at java.io.Reader.<init>(Reader.java:61)
>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:76)
>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:91)
>      [exec]     at 
> org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:151)
>      [exec]     at 
> org.apache.nutch.analysis.CommonGrams.<init>(CommonGrams.java:51)
>      [exec]     at 
> org.apache.nutch.searcher.FieldQueryFilter.setConf(FieldQueryFilter.java:107) 
> 
>      [exec]     at 
> org.apache.nutch.searcher.url.URLQueryFilter.setConf(URLQueryFilter.java:33) 
> 
>      [exec]     at 
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153)
>      [exec]     at 
> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:75)
>      [exec]     at 
> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:78)
>      [exec]     at 
> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:62)
>      [exec]     at 
> org.apache.nutch.searcher.NutchBean.init(NutchBean.java:139)
>      [exec]     at 
> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
> ----------
> 
> conf.toString() gives:
> "Configuration: defaults: hadoop-default.xml , nutch-default.xmlfinal: 
> hadoop-site.xml , nutch-site.xml"
> 
> I assume that Nutch can't find its configuration files. That's why I'm 
> try to tell it the correct path when calling new NutchBean(...)
> 
> Does anybody know what my problem might be, or even a possible solution? :)
> 
> Thanks in advance.
> 
> Wolfgang

Re: NullPointerException when tying to init NutchBean

Posted by Wolfgang Woerndl <wo...@informatik.tu-muenchen.de>.
Thanks, now it works, just some feedback for everybody:
- Including the Nutch conf directory in the classpath solved the NPE
- I really need to set the path to the index dir in the NutchBean constructor, 
otherwise I get 0 hits (despite having a searcher.dir proporty with the path in 
nutch-site.xml)

Wolfgang

Sagar Naik wrote:
> Hey,
> I would like to mention 2 points :
> - The nutch config files shud be in the classpath.
> - The 2nd arg in NutchBean ctor is the path to index dir
> 
> I guess this shud solve the NPE
> 
> 
> 
> 
> 
> Wolfgang Woerndl wrote:
>> Hello,
>>
>> I installed Nutch 0.8.1., crawled some Web pages and get (meaningful) 
>> results when calling
>>    bin/nutch org.apache.nutch.searcher.NutchBean test
>> from the Nutch directory.
>>
>> Now, I'm trying to integrate searching in a larger Servlet application:
>>
>> public String fulltextSearch (String querystring, int maxhits)
>> {
>>    Configuration conf = NutchConfiguration.create();
>>    NutchBean bean = new NutchBean(conf, new Path("/path/to/nutch/"));
>>    Query query = Query.parse(querystring, conf);
>>    ...
>>
>> It compiles but I get a NullPointerException at runtime:
>>
>> ----------
>>      [exec] java.lang.NullPointerException
>>      [exec]     at java.io.Reader.<init>(Reader.java:61)
>>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:76)
>>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:91)
>>      [exec]     at 
>> org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:151)
>>      [exec]     at 
>> org.apache.nutch.analysis.CommonGrams.<init>(CommonGrams.java:51)
>>      [exec]     at 
>> org.apache.nutch.searcher.FieldQueryFilter.setConf(FieldQueryFilter.java:107) 
>>
>>      [exec]     at 
>> org.apache.nutch.searcher.url.URLQueryFilter.setConf(URLQueryFilter.java:33) 
>>
>>      [exec]     at 
>> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153) 
>>
>>      [exec]     at 
>> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:75)
>>      [exec]     at 
>> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:78)
>>      [exec]     at 
>> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:62)
>>      [exec]     at 
>> org.apache.nutch.searcher.NutchBean.init(NutchBean.java:139)
>>      [exec]     at 
>> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
>> ----------
>>
>> conf.toString() gives:
>> "Configuration: defaults: hadoop-default.xml , nutch-default.xmlfinal: 
>> hadoop-site.xml , nutch-site.xml"
>>
>> I assume that Nutch can't find its configuration files. That's why I'm 
>> try to tell it the correct path when calling new NutchBean(...)
>>
>> Does anybody know what my problem might be, or even a possible 
>> solution? :)
>>
>> Thanks in advance.
>>
>> Wolfgang
>>
> 
> 

-- 
Dr. Wolfgang Woerndl, Institut fuer Informatik, TU Muenchen
Boltzmannstr. 3, 85748 Garching. Buero: Raum 01.05.043
Tel: +49 89 289-18686, Fax: -18657
http://www.in.tum.de/~woerndl/, Email: woerndl@in.tum.de

Re: NullPointerException when tying to init NutchBean

Posted by Sagar Naik <sa...@visvo.com>.
Hey,
I would like to mention 2 points :
- The nutch config files shud be in the classpath.
- The 2nd arg in NutchBean ctor is the path to index dir

I guess this shud solve the NPE





Wolfgang Woerndl wrote:
> Hello,
>
> I installed Nutch 0.8.1., crawled some Web pages and get (meaningful) 
> results when calling
>    bin/nutch org.apache.nutch.searcher.NutchBean test
> from the Nutch directory.
>
> Now, I'm trying to integrate searching in a larger Servlet application:
>
> public String fulltextSearch (String querystring, int maxhits)
> {
>    Configuration conf = NutchConfiguration.create();
>    NutchBean bean = new NutchBean(conf, new Path("/path/to/nutch/"));
>    Query query = Query.parse(querystring, conf);
>    ...
>
> It compiles but I get a NullPointerException at runtime:
>
> ----------
>      [exec] java.lang.NullPointerException
>      [exec]     at java.io.Reader.<init>(Reader.java:61)
>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:76)
>      [exec]     at java.io.BufferedReader.<init>(BufferedReader.java:91)
>      [exec]     at 
> org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:151)
>      [exec]     at 
> org.apache.nutch.analysis.CommonGrams.<init>(CommonGrams.java:51)
>      [exec]     at 
> org.apache.nutch.searcher.FieldQueryFilter.setConf(FieldQueryFilter.java:107) 
>
>      [exec]     at 
> org.apache.nutch.searcher.url.URLQueryFilter.setConf(URLQueryFilter.java:33) 
>
>      [exec]     at 
> org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153) 
>
>      [exec]     at 
> org.apache.nutch.searcher.QueryFilters.<init>(QueryFilters.java:75)
>      [exec]     at 
> org.apache.nutch.searcher.IndexSearcher.init(IndexSearcher.java:78)
>      [exec]     at 
> org.apache.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:62)
>      [exec]     at 
> org.apache.nutch.searcher.NutchBean.init(NutchBean.java:139)
>      [exec]     at 
> org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:105)
> ----------
>
> conf.toString() gives:
> "Configuration: defaults: hadoop-default.xml , nutch-default.xmlfinal: 
> hadoop-site.xml , nutch-site.xml"
>
> I assume that Nutch can't find its configuration files. That's why I'm 
> try to tell it the correct path when calling new NutchBean(...)
>
> Does anybody know what my problem might be, or even a possible 
> solution? :)
>
> Thanks in advance.
>
> Wolfgang
>


-- 
This message has been scanned for viruses and
dangerous content and is believed to be clean.