You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by J Ilari Moilanen <im...@cc.helsinki.fi> on 2007/08/03 18:54:31 UTC

Field based search on metadata

My goal is this: I try to restrict search results to spesific pages that
have spesific metadata fields set to known values. So I add a few
checkboxes to the search form in addition to text input so that the user
can make restrictions for example looking like this

Location
[ ] UK   [X] USA

So I used this example
http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
to index the metafields in question. And now when I check with Luke what
the index looks like it has those fields in place. But my problem is that
when ever I try to search something and at the sametime restrict the query
to some spesific location (in this case) I get zero results. The same goes
when I use only the text input to make a query like this
wordtosearchfor +location:USA
or I when I set programmitcally the loation field in Query object with
addRequiredPhrase("USA","location") or with addRequiredTerm...

I have checked that my queryfilter gets executed in every query and have
tried to change the class that my filter extends to RawFieldQueryFilter. I
have even tried to add the location field to default fields in
BasicQueryFilter (I was desperate). I have also tried to change the way
Lucene indexes the fields I save (stored or no stored, tokenized or
untokenized).
At one point I thought that there was a bug in the field search feature
but then I noticed that I do get some results with queries like this
wordtosearchfor +url:urlcontainingtheword
and after that I tried to mimic everything that was done to url field
(from parsing to indexing and to querying). But to no avail. Always zero
results.

I've already spent too much time working on this and I'm obviously on a
wrong track here :)

Any toughts someone? There propably is a plugin ready that does the same
thing I try to accomplish here (you can point me to it) but I really would
like to know what I'm doing wrong.

cheers,
Ilari

RE: Field based search on metadata

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Jasper,

   I would suggest the following:

If you are sure that the location field is indexed correctly, try to do the
following: 

a. make sure there is no problem due to case-conversion. I once had a
problem cause I indexed the field in lower-case, but was searching in
upper-case. A simple way to check this would be to give the location in
lower case in your query.

b. Write some test code in your query filter. Maybe you can disable all
other query plugins and just enable yours. What your filter can do is search
for all documents from a location (usa or uk) irrespective of the query. If
this gives you hits - you are sure that the search based on location is
working alrite. Maybe there is a problem the way the basic/more query
plugins interact with your plugin then?

I am taking a shot in the dark here - but maybe you'll find something this
way. All the best!

-vishal.

-----Original Message-----
From: Jasper Kamperman [mailto:jasper.kamperman@openwaternet.com] 
Sent: Wednesday, August 08, 2007 7:30 AM
To: nutch-user@lucene.apache.org
Subject: Re: Field based search on metadata

Does anyone know of a solution to this problem? I've tried several of  
the approaches below but so far also have been unable to search for a  
custom field I created. It shows up in Luke just fine but using the  
custom field in query strings consistently gives 0 results.

On Aug 3, 2007, at 9:54 AM, J Ilari Moilanen wrote:

> My goal is this: I try to restrict search results to spesific pages  
> that
> have spesific metadata fields set to known values. So I add a few
> checkboxes to the search form in addition to text input so that the  
> user
> can make restrictions for example looking like this
>
> Location
> [ ] UK   [X] USA
>
> So I used this example
> http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
> to index the metafields in question. And now when I check with Luke  
> what
> the index looks like it has those fields in place. But my problem  
> is that
> when ever I try to search something and at the sametime restrict  
> the query
> to some spesific location (in this case) I get zero results. The  
> same goes
> when I use only the text input to make a query like this
> wordtosearchfor +location:USA
> or I when I set programmitcally the loation field in Query object with
> addRequiredPhrase("USA","location") or with addRequiredTerm...
>
> I have checked that my queryfilter gets executed in every query and  
> have
> tried to change the class that my filter extends to  
> RawFieldQueryFilter. I
> have even tried to add the location field to default fields in
> BasicQueryFilter (I was desperate). I have also tried to change the  
> way
> Lucene indexes the fields I save (stored or no stored, tokenized or
> untokenized).
> At one point I thought that there was a bug in the field search  
> feature
> but then I noticed that I do get some results with queries like this
> wordtosearchfor +url:urlcontainingtheword
> and after that I tried to mimic everything that was done to url field
> (from parsing to indexing and to querying). But to no avail. Always  
> zero
> results.
>
> I've already spent too much time working on this and I'm obviously  
> on a
> wrong track here :)
>
> Any toughts someone? There propably is a plugin ready that does the  
> same
> thing I try to accomplish here (you can point me to it) but I  
> really would
> like to know what I'm doing wrong.
>
> cheers,
> Ilari
>



Re: Field based search on metadata

Posted by Jasper Kamperman <ja...@openwaternet.com>.
Does anyone know of a solution to this problem? I've tried several of  
the approaches below but so far also have been unable to search for a  
custom field I created. It shows up in Luke just fine but using the  
custom field in query strings consistently gives 0 results.

On Aug 3, 2007, at 9:54 AM, J Ilari Moilanen wrote:

> My goal is this: I try to restrict search results to spesific pages  
> that
> have spesific metadata fields set to known values. So I add a few
> checkboxes to the search form in addition to text input so that the  
> user
> can make restrictions for example looking like this
>
> Location
> [ ] UK   [X] USA
>
> So I used this example
> http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
> to index the metafields in question. And now when I check with Luke  
> what
> the index looks like it has those fields in place. But my problem  
> is that
> when ever I try to search something and at the sametime restrict  
> the query
> to some spesific location (in this case) I get zero results. The  
> same goes
> when I use only the text input to make a query like this
> wordtosearchfor +location:USA
> or I when I set programmitcally the loation field in Query object with
> addRequiredPhrase("USA","location") or with addRequiredTerm...
>
> I have checked that my queryfilter gets executed in every query and  
> have
> tried to change the class that my filter extends to  
> RawFieldQueryFilter. I
> have even tried to add the location field to default fields in
> BasicQueryFilter (I was desperate). I have also tried to change the  
> way
> Lucene indexes the fields I save (stored or no stored, tokenized or
> untokenized).
> At one point I thought that there was a bug in the field search  
> feature
> but then I noticed that I do get some results with queries like this
> wordtosearchfor +url:urlcontainingtheword
> and after that I tried to mimic everything that was done to url field
> (from parsing to indexing and to querying). But to no avail. Always  
> zero
> results.
>
> I've already spent too much time working on this and I'm obviously  
> on a
> wrong track here :)
>
> Any toughts someone? There propably is a plugin ready that does the  
> same
> thing I try to accomplish here (you can point me to it) but I  
> really would
> like to know what I'm doing wrong.
>
> cheers,
> Ilari
>