You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Patrick Simon <Pa...@virginblue.com.au> on 2007/02/07 08:38:32 UTC

n00b question follow up

Hi All,

The is an older post I made with more details from logs that will
hopefully be painfully obvious to someone out there why its not
working..

It appears that I have successfully created a Nutch index via the
command "nutch/bin :>./nutch crawl ../urls -dir ../crawl.test -depth 5".

I say it is successful as when I use Luke (a Lucene GUI tool that
interegates Lucene indexes) to view the index, a valid index and search
results come up.

The directory I point Luke to is
/home/simonp/nutch-0.8/crawl.test/indexes/part-00000 (the value I give
for searcher.dir in nutch-default.xml is
"/home/simonp/nutch-0.8/crawl.test")

The problem is that I cannot see any results via the command "bin/nutch
org.apache.nutch.searcher.NutchBean apache" or when I search for the
string apache within the nutch servlet.

I don't run any fetching or indexing as the tutorial says not to for
simple intranet searching.

I am using Tomcat 5.5 and Nutch 0.8.

Can any body help with this one please?

The output from catalina.out is

2007-02-06 09:01:27,990 INFO  NutchBean - opening indexes in
/home/simonp/nutch-8.0/crawl.test/indexes
2007-02-06 09:01:28,032 INFO  Configuration - found resource
common-terms.utf8 at
file:/usr/local/tomcat/webapps/nutch-0.8/WEB-INF/classes/common-terms.ut
f8
2007-02-06 09:01:28,037 INFO  NutchBean - opening segments in
/home/simonp/nutch-8.0/crawl.test/segments
2007-02-06 09:01:28,056 INFO  SummarizerFactory - Using the first
summarizer extension found: Basic Summarizer
2007-02-06 09:01:28,056 INFO  NutchBean - opening linkdb in
/home/simonp/nutch-8.0/crawl.test/linkdb
2007-02-06 09:01:28,062 INFO  NutchBean - query request from
192.168.5.173
2007-02-06 09:01:28,072 INFO  NutchBean - query: ubuntu
2007-02-06 09:01:28,072 INFO  NutchBean - lang: en
2007-02-06 09:01:28,101 INFO  NutchBean - searching for 20 raw hits
2007-02-06 09:01:28,142 INFO  NutchBean - total hits: 0
2007-02-06 09:01:30,506 INFO  NutchBean - query request from
192.168.5.173
2007-02-06 09:01:30,506 INFO  NutchBean - query: apache
2007-02-06 09:01:30,506 INFO  NutchBean - lang: en
2007-02-06 09:01:30,507 INFO  NutchBean - searching for 20 raw hits
2007-02-06 09:01:30,507 INFO  NutchBean - total hits: 0
2007-02-06 09:01:51,191 INFO  NutchBean - query request from
192.168.5.173
2007-02-06 09:01:51,191 INFO  NutchBean - query: test
2007-02-06 09:01:51,191 INFO  NutchBean - lang: en
2007-02-06 09:01:51,193 INFO  NutchBean - searching for 20 raw hits
2007-02-06 09:01:51,193 INFO  NutchBean - total hits: 0
2007-02-06 10:22:51,068 INFO  NutchBean - query request from
192.168.5.173
2007-02-06 10:22:51,070 INFO  NutchBean - query: test
2007-02-06 10:22:51,070 INFO  NutchBean - lang: en
2007-02-06 10:22:51,073 INFO  NutchBean - searching for 20 raw hits
2007-02-06 10:22:51,076 INFO  NutchBean - total hits: 0
OAG Best Low Cost Airline Of The Year 

The content of this e-mail, including any attachments, is a confidential communication between Virgin Blue, Pacific Blue or a related entity (or the sender if this email is a private communication) and the intended addressee and is for the sole use of that intended addressee. If you are not the intended addressee, any use, interference with, disclosure or copying of this material is unauthorized and prohibited. If you have received this e-mail in error please contact the sender immediately and then delete the message and any attachment(s). There is no warranty that this email is error, virus or defect free. This email is also subject to copyright. No part of it should be reproduced, adapted or communicated without the written consent of the copyright owner. If this is a private communication it does not represent the views of Virgin Blue, Pacific Blue or their related entities. Please be aware that the contents of any emails sent to or from Virgin Blue, Pacific Blue or their related entities may be periodically monitored and reviewed. Virgin Blue, Pacific Blue and their related entities respect your privacy. Our privacy policy can be accessed from our website: www.virginblue.com.au


Re: n00b question follow up

Posted by Alvaro Cabrerizo <to...@gmail.com>.
Hi:

First you can check that query plugins (query-basic, more, etc) appear in
your nutch-site.xml. If everything is ok,
you can add a LOG line in the method "search" of the class
org.apache.nutch.searcher.IndexSearcher in order to see how the lucene query
is built. If I'm not wrong you have to add in line 99 LOG.info("query
->"+luceneQuery.toString()); This method should look like this:

public Hits search(Query query...)
...
try{
 org.apache.lucene.search.BooleanQuery luceneQuery =
this.queryFilters.filter(query);
LOG.info("query -> "+luceneQuery.toString());
return ..

Recompile, and make a new query.

Hope it helps.





2007/2/7, Patrick Simon <Pa...@virginblue.com.au>:
>
> Hi All,
>
> The is an older post I made with more details from logs that will
> hopefully be painfully obvious to someone out there why its not
> working..
>
> It appears that I have successfully created a Nutch index via the
> command "nutch/bin :>./nutch crawl ../urls -dir ../crawl.test -depth 5".
>
> I say it is successful as when I use Luke (a Lucene GUI tool that
> interegates Lucene indexes) to view the index, a valid index and search
> results come up.
>
> The directory I point Luke to is
> /home/simonp/nutch-0.8/crawl.test/indexes/part-00000 (the value I give
> for searcher.dir in nutch-default.xml is
> "/home/simonp/nutch-0.8/crawl.test")
>
> The problem is that I cannot see any results via the command "bin/nutch
> org.apache.nutch.searcher.NutchBean apache" or when I search for the
> string apache within the nutch servlet.
>
> I don't run any fetching or indexing as the tutorial says not to for
> simple intranet searching.
>
> I am using Tomcat 5.5 and Nutch 0.8.
>
> Can any body help with this one please?
>
> The output from catalina.out is
>
> 2007-02-06 09:01:27,990 INFO  NutchBean - opening indexes in
> /home/simonp/nutch-8.0/crawl.test/indexes
> 2007-02-06 09:01:28,032 INFO  Configuration - found resource
> common-terms.utf8 at
> file:/usr/local/tomcat/webapps/nutch-0.8/WEB-INF/classes/common-terms.ut
> f8
> 2007-02-06 09:01:28,037 INFO  NutchBean - opening segments in
> /home/simonp/nutch-8.0/crawl.test/segments
> 2007-02-06 09:01:28,056 INFO  SummarizerFactory - Using the first
> summarizer extension found: Basic Summarizer
> 2007-02-06 09:01:28,056 INFO  NutchBean - opening linkdb in
> /home/simonp/nutch-8.0/crawl.test/linkdb
> 2007-02-06 09:01:28,062 INFO  NutchBean - query request from
> 192.168.5.173
> 2007-02-06 09:01:28,072 INFO  NutchBean - query: ubuntu
> 2007-02-06 09:01:28,072 INFO  NutchBean - lang: en
> 2007-02-06 09:01:28,101 INFO  NutchBean - searching for 20 raw hits
> 2007-02-06 09:01:28,142 INFO  NutchBean - total hits: 0
> 2007-02-06 09:01:30,506 INFO  NutchBean - query request from
> 192.168.5.173
> 2007-02-06 09:01:30,506 INFO  NutchBean - query: apache
> 2007-02-06 09:01:30,506 INFO  NutchBean - lang: en
> 2007-02-06 09:01:30,507 INFO  NutchBean - searching for 20 raw hits
> 2007-02-06 09:01:30,507 INFO  NutchBean - total hits: 0
> 2007-02-06 09:01:51,191 INFO  NutchBean - query request from
> 192.168.5.173
> 2007-02-06 09:01:51,191 INFO  NutchBean - query: test
> 2007-02-06 09:01:51,191 INFO  NutchBean - lang: en
> 2007-02-06 09:01:51,193 INFO  NutchBean - searching for 20 raw hits
> 2007-02-06 09:01:51,193 INFO  NutchBean - total hits: 0
> 2007-02-06 10:22:51,068 INFO  NutchBean - query request from
> 192.168.5.173
> 2007-02-06 10:22:51,070 INFO  NutchBean - query: test
> 2007-02-06 10:22:51,070 INFO  NutchBean - lang: en
> 2007-02-06 10:22:51,073 INFO  NutchBean - searching for 20 raw hits
> 2007-02-06 10:22:51,076 INFO  NutchBean - total hits: 0
> OAG Best Low Cost Airline Of The Year
>
> The content of this e-mail, including any attachments, is a confidential
> communication between Virgin Blue, Pacific Blue or a related entity (or the
> sender if this email is a private communication) and the intended addressee
> and is for the sole use of that intended addressee. If you are not the
> intended addressee, any use, interference with, disclosure or copying of
> this material is unauthorized and prohibited. If you have received this
> e-mail in error please contact the sender immediately and then delete the
> message and any attachment(s). There is no warranty that this email is
> error, virus or defect free. This email is also subject to copyright. No
> part of it should be reproduced, adapted or communicated without the written
> consent of the copyright owner. If this is a private communication it does
> not represent the views of Virgin Blue, Pacific Blue or their related
> entities. Please be aware that the contents of any emails sent to or from
> Virgin Blue, Pacific Blue or their related entities may be periodically
> monitored and reviewed. Virgin Blue, Pacific Blue and their related entities
> respect your privacy. Our privacy policy can be accessed from our website:
> www.virginblue.com.au
>
>