You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fred Tyre <fr...@hlipublishing.com> on 2006/08/08 14:57:28 UTC

How do I write a nutch query.

How do I do a search in nutch.
If I go to google.com, I just type in the keyword(s) that I am looking for.
Is this not the case with nutch, or do I have to change the default
configuration to enable that ability.

Example test case...
I enter "forum" on the nutch website and click "Search" or I run the
following command line...
bin/nutch org.apache.nutch.searcher.NutchBean forum

In both cases it returns 0 Results.

However, if I go into luke and run a content search on "forum", I get 2
results.

I looked on your FAQ for this topic and could not find the question/answer.

I would think that the above question would be more frequently asked, then
"Common words are saturating my search results." or "How can I influence
Nutch scoring?".

Please, help.

I have asked this kind of question before and not gotten a response.

Sincerely,
Fred

><><><><><><><><><><><><><><><><><><
   Fred Tyre
   Information Services
   Heartland Communications, Inc.
   515-574-2147
   Fred.Tyre@hlipublishing.com
><><><><><><><><><><><><><><><><><><




RE: How do I write a nutch query.

Posted by Fred Tyre <fr...@hlipublishing.com>.
Unless there is a minimum grade specified in nutch.

I don't know if this is the case or not, so I was going to test it out with
more content (ie: higher grades (hopefully)).

-----Original Message-----
From: Tomi NA [mailto:hefest@gmail.com]
Sent: Tuesday, August 08, 2006 9:58 AM
To: nutch-user@lucene.apache.org
Subject: Re: How do I write a nutch query.


On 8/8/06, Björn Wilmsmann <bj...@wilmsmann.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I have run into the same problem, too. Sometimes nutch won't return
> results for queries although there clearly are pages containing the
> search term. I agree that this must have something to do with Nutch
> scoring however I have not yet found out how to change this behaviour

I ran into the same problem but I believe it has something to do with
the analyzer (probably StandardAnalyzer, I don't really know what
Nutch uses by default, yet), plugins for those files or something
along those lines.
As far as grading is concerned, wouldn't a grading problem change the
result order, rather than skip certain results altogether?

t.n.a.


Re: How do I write a nutch query.

Posted by Tomi NA <he...@gmail.com>.
On 8/8/06, Björn Wilmsmann <bj...@wilmsmann.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I have run into the same problem, too. Sometimes nutch won't return
> results for queries although there clearly are pages containing the
> search term. I agree that this must have something to do with Nutch
> scoring however I have not yet found out how to change this behaviour

I ran into the same problem but I believe it has something to do with
the analyzer (probably StandardAnalyzer, I don't really know what
Nutch uses by default, yet), plugins for those files or something
along those lines.
As far as grading is concerned, wouldn't a grading problem change the
result order, rather than skip certain results altogether?

t.n.a.

RE: How do I write a nutch query.

Posted by Fred Tyre <fr...@hlipublishing.com>.
I have nutch build the searcher.dir in another location and then copy it to
the Tomcat installation.
(C:\cygwin\home\fred\nutch-0.8\WebPages --> C:\Program Files\Apache Software
Foundation\Tomcat 5.5\webapps\crawl)

However, I know that this isn't the problem.
If I run "bin/nutch org.apache.nutch.searcher.NutchBean forum" inside the
original directory, I still get 0 results returned.

I believe that Björn Wilmsmann was correct in saying that it is a grading
issue.

I am going to add in a bunch of other websites into the crawler database and
try it again (I originally only had 2 websites crawled).

Thanks for the responses.

-----Original Message-----
From: Zaheed Haque [mailto:zaheed.haque@gmail.com]
Sent: Tuesday, August 08, 2006 9:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: How do I write a nutch query.


just curious.. what does your searcher.dir property look like under
conf/nutch-site.xml? or nutch-default.xml? and
tomcat/ROOT/WEB-INF/classes/nutch-site.xml or nutch-default.xml for
that matter...they should all point to the same directory..

-Zaheed

On 8/8/06, Björn Wilmsmann <bj...@wilmsmann.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I have run into the same problem, too. Sometimes nutch won't return
> results for queries although there clearly are pages containing the
> search term. I agree that this must have something to do with Nutch
> scoring however I have not yet found out how to change this behaviour
>
> Am 08.08.2006 um 14:57 schrieb Fred Tyre:
>
> >
> > How do I do a search in nutch.
> > If I go to google.com, I just type in the keyword(s) that I am
> > looking for.
> > Is this not the case with nutch, or do I have to change the default
> > configuration to enable that ability.
> >
> > Example test case...
> > I enter "forum" on the nutch website and click "Search" or I run the
> > following command line...
> > bin/nutch org.apache.nutch.searcher.NutchBean forum
> >
> > In both cases it returns 0 Results.
> >
> > However, if I go into luke and run a content search on "forum", I
> > get 2
> > results.
> >
> > I looked on your FAQ for this topic and could not find the question/
> > answer.
> >
> > I would think that the above question would be more frequently
> > asked, then
> > "Common words are saturating my search results." or "How can I
> > influence
> > Nutch scoring?".
> >
> > Please, help.
> >
> > I have asked this kind of question before and not gotten a response.
> >
> > Sincerely,
> > Fred
> >
> >> <><><><><><><><><><><><><><><><><><
> >    Fred Tyre
> >    Information Services
> >    Heartland Communications, Inc.
> >    515-574-2147
> >    Fred.Tyre@hlipublishing.com
> >> <><><><><><><><><><><><><><><><><><
> >
> >
> >
> >
>
> - --
> Best regards,
> Björn Wilmsmann
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFE2Iy2gz0R1bg11MERAoBnAKCedV5b7IScRSFuj5B356D7mrNyzACg7rvq
> VVdN+hUYbWpRXIkH2GDYguI=
> =E+g8
> -----END PGP SIGNATURE-----
>


RE: How do I write a nutch query.

Posted by Fred Tyre <fr...@hlipublishing.com>.
So does that mean that I have to have the whole nutch-0.8 folder under my
Tomcat/webapps folder?

-----Original Message-----
From: Zaheed Haque [mailto:zaheed.haque@gmail.com]
Sent: Tuesday, August 08, 2006 9:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: How do I write a nutch query.


just curious.. what does your searcher.dir property look like under
conf/nutch-site.xml? or nutch-default.xml? and
tomcat/ROOT/WEB-INF/classes/nutch-site.xml or nutch-default.xml for
that matter...they should all point to the same directory..

-Zaheed

On 8/8/06, Björn Wilmsmann <bj...@wilmsmann.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I have run into the same problem, too. Sometimes nutch won't return
> results for queries although there clearly are pages containing the
> search term. I agree that this must have something to do with Nutch
> scoring however I have not yet found out how to change this behaviour
>
> Am 08.08.2006 um 14:57 schrieb Fred Tyre:
>
> >
> > How do I do a search in nutch.
> > If I go to google.com, I just type in the keyword(s) that I am
> > looking for.
> > Is this not the case with nutch, or do I have to change the default
> > configuration to enable that ability.
> >
> > Example test case...
> > I enter "forum" on the nutch website and click "Search" or I run the
> > following command line...
> > bin/nutch org.apache.nutch.searcher.NutchBean forum
> >
> > In both cases it returns 0 Results.
> >
> > However, if I go into luke and run a content search on "forum", I
> > get 2
> > results.
> >
> > I looked on your FAQ for this topic and could not find the question/
> > answer.
> >
> > I would think that the above question would be more frequently
> > asked, then
> > "Common words are saturating my search results." or "How can I
> > influence
> > Nutch scoring?".
> >
> > Please, help.
> >
> > I have asked this kind of question before and not gotten a response.
> >
> > Sincerely,
> > Fred
> >
> >> <><><><><><><><><><><><><><><><><><
> >    Fred Tyre
> >    Information Services
> >    Heartland Communications, Inc.
> >    515-574-2147
> >    Fred.Tyre@hlipublishing.com
> >> <><><><><><><><><><><><><><><><><><
> >
> >
> >
> >
>
> - --
> Best regards,
> Björn Wilmsmann
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFE2Iy2gz0R1bg11MERAoBnAKCedV5b7IScRSFuj5B356D7mrNyzACg7rvq
> VVdN+hUYbWpRXIkH2GDYguI=
> =E+g8
> -----END PGP SIGNATURE-----
>


Re: How do I write a nutch query.

Posted by Zaheed Haque <za...@gmail.com>.
just curious.. what does your searcher.dir property look like under
conf/nutch-site.xml? or nutch-default.xml? and
tomcat/ROOT/WEB-INF/classes/nutch-site.xml or nutch-default.xml for
that matter...they should all point to the same directory..

-Zaheed

On 8/8/06, Björn Wilmsmann <bj...@wilmsmann.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hey,
>
> I have run into the same problem, too. Sometimes nutch won't return
> results for queries although there clearly are pages containing the
> search term. I agree that this must have something to do with Nutch
> scoring however I have not yet found out how to change this behaviour
>
> Am 08.08.2006 um 14:57 schrieb Fred Tyre:
>
> >
> > How do I do a search in nutch.
> > If I go to google.com, I just type in the keyword(s) that I am
> > looking for.
> > Is this not the case with nutch, or do I have to change the default
> > configuration to enable that ability.
> >
> > Example test case...
> > I enter "forum" on the nutch website and click "Search" or I run the
> > following command line...
> > bin/nutch org.apache.nutch.searcher.NutchBean forum
> >
> > In both cases it returns 0 Results.
> >
> > However, if I go into luke and run a content search on "forum", I
> > get 2
> > results.
> >
> > I looked on your FAQ for this topic and could not find the question/
> > answer.
> >
> > I would think that the above question would be more frequently
> > asked, then
> > "Common words are saturating my search results." or "How can I
> > influence
> > Nutch scoring?".
> >
> > Please, help.
> >
> > I have asked this kind of question before and not gotten a response.
> >
> > Sincerely,
> > Fred
> >
> >> <><><><><><><><><><><><><><><><><><
> >    Fred Tyre
> >    Information Services
> >    Heartland Communications, Inc.
> >    515-574-2147
> >    Fred.Tyre@hlipublishing.com
> >> <><><><><><><><><><><><><><><><><><
> >
> >
> >
> >
>
> - --
> Best regards,
> Björn Wilmsmann
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFE2Iy2gz0R1bg11MERAoBnAKCedV5b7IScRSFuj5B356D7mrNyzACg7rvq
> VVdN+hUYbWpRXIkH2GDYguI=
> =E+g8
> -----END PGP SIGNATURE-----
>

Re: How do I write a nutch query.

Posted by Björn Wilmsmann <bj...@wilmsmann.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey,

I have run into the same problem, too. Sometimes nutch won't return  
results for queries although there clearly are pages containing the  
search term. I agree that this must have something to do with Nutch  
scoring however I have not yet found out how to change this behaviour

Am 08.08.2006 um 14:57 schrieb Fred Tyre:

>
> How do I do a search in nutch.
> If I go to google.com, I just type in the keyword(s) that I am  
> looking for.
> Is this not the case with nutch, or do I have to change the default
> configuration to enable that ability.
>
> Example test case...
> I enter "forum" on the nutch website and click "Search" or I run the
> following command line...
> bin/nutch org.apache.nutch.searcher.NutchBean forum
>
> In both cases it returns 0 Results.
>
> However, if I go into luke and run a content search on "forum", I  
> get 2
> results.
>
> I looked on your FAQ for this topic and could not find the question/ 
> answer.
>
> I would think that the above question would be more frequently  
> asked, then
> "Common words are saturating my search results." or "How can I  
> influence
> Nutch scoring?".
>
> Please, help.
>
> I have asked this kind of question before and not gotten a response.
>
> Sincerely,
> Fred
>
>> <><><><><><><><><><><><><><><><><><
>    Fred Tyre
>    Information Services
>    Heartland Communications, Inc.
>    515-574-2147
>    Fred.Tyre@hlipublishing.com
>> <><><><><><><><><><><><><><><><><><
>
>
>
>

- --
Best regards,
Björn Wilmsmann


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFE2Iy2gz0R1bg11MERAoBnAKCedV5b7IScRSFuj5B356D7mrNyzACg7rvq
VVdN+hUYbWpRXIkH2GDYguI=
=E+g8
-----END PGP SIGNATURE-----