You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ismael <kr...@gmail.com> on 2007/09/05 04:05:51 UTC

Searching in field "content" doesn't return any hit

Hello,

I want to search in a nutch index in the field "content"; but I get no results.

I modified the index-basic plugin to boost content field and to store
it; but searching still doesn't return hits.

Here is the code I use to search:

                NutchBean bean = new NutchBean(conf,index);
		Query query = Query.parse("content:orde", conf);
		Hits hits = bean.search(query, 10000);

It doesn't return any hit; searching the same with Luke returns 4
documents with boosts 1, 0.54, 0.54, 0.43.

I am pretty sure that in my code "conf" and "index" are the right
values since searching in another field returns the same results as
Luke.

I searched over nutch-default to see any property that could help but
i found nothing; I suppose I am missing something but i cannot figure
what I am missing.

Thank you for reading!

Re: Re: Searching in field "content" doesn't return any hit

Posted by Ismael <kr...@gmail.com>.
First of all, thank you for answering so soon.
And second, it worked well, thank you for the help. I was blinded with
Lucene index, I was used to specify my queries with all of the fields,
I didn't think that Nutch worked different in its content field.

Again, thank you for answering!

2007/9/5, Doğacan Güney <do...@gmail.com>:
> On 9/5/07, Ismael <kr...@gmail.com> wrote:
> > Hello,
> >
> > I want to search in a nutch index in the field "content"; but I get no results.
> >
> > I modified the index-basic plugin to boost content field and to store
> > it; but searching still doesn't return hits.
> >
> > Here is the code I use to search:
> >
> >                 NutchBean bean = new NutchBean(conf,index);
> >                 Query query = Query.parse("content:orde", conf);
> >                 Hits hits = bean.search(query, 10000);
> >
> > It doesn't return any hit; searching the same with Luke returns 4
> > documents with boosts 1, 0.54, 0.54, 0.43.
> >
> > I am pretty sure that in my code "conf" and "index" are the right
> > values since searching in another field returns the same results as
> > Luke.
> >
> > I searched over nutch-default to see any property that could help but
> > i found nothing; I suppose I am missing something but i cannot figure
> > what I am missing.
>
> A query like "content:orde" is parsed by nutch's query plugins which
> convert it into a Lucene BooleanQuery. However, by default, there are
> no handlers for field "content" so, it is parsed to something like
> this "content orde". Here is how it works:
>
> * Nutch's Query checks if there is a handler for field "content".
> (check out query-url or query-site for an example).
> * Since there are none, query gets parsed to "content orde".
> * After this, query plugins work on resulting query. Assuming only
> query-basic is active, it will be something like this in the end (note
> that query plugins receive nutch's query + lucene boolean query built
> up so far and update boolean query): "(content:orde title:order
> url:orde)" <-- this one is the resulting lucene query.
>
> Hope this helps.
>
> >
> > Thank you for reading!
> >
>
>
> --
> Doğacan Güney
>

Re: Searching in field "content" doesn't return any hit

Posted by Doğacan Güney <do...@gmail.com>.
On 9/5/07, Ismael <kr...@gmail.com> wrote:
> Hello,
>
> I want to search in a nutch index in the field "content"; but I get no results.
>
> I modified the index-basic plugin to boost content field and to store
> it; but searching still doesn't return hits.
>
> Here is the code I use to search:
>
>                 NutchBean bean = new NutchBean(conf,index);
>                 Query query = Query.parse("content:orde", conf);
>                 Hits hits = bean.search(query, 10000);
>
> It doesn't return any hit; searching the same with Luke returns 4
> documents with boosts 1, 0.54, 0.54, 0.43.
>
> I am pretty sure that in my code "conf" and "index" are the right
> values since searching in another field returns the same results as
> Luke.
>
> I searched over nutch-default to see any property that could help but
> i found nothing; I suppose I am missing something but i cannot figure
> what I am missing.

A query like "content:orde" is parsed by nutch's query plugins which
convert it into a Lucene BooleanQuery. However, by default, there are
no handlers for field "content" so, it is parsed to something like
this "content orde". Here is how it works:

* Nutch's Query checks if there is a handler for field "content".
(check out query-url or query-site for an example).
* Since there are none, query gets parsed to "content orde".
* After this, query plugins work on resulting query. Assuming only
query-basic is active, it will be something like this in the end (note
that query plugins receive nutch's query + lucene boolean query built
up so far and update boolean query): "(content:orde title:order
url:orde)" <-- this one is the resulting lucene query.

Hope this helps.

>
> Thank you for reading!
>


-- 
Doğacan Güney