You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by fell <na...@tcs.com> on 2009/01/19 07:20:53 UTC

Lucene search question

Hi all, 

I am new to Lucene and I need to know the following: 

In case I have indexed some data using Lucene and it contains the fields: 
Location, City, Country. 

Suppose the data is as follows in the index in each of the above fields: 
1) R G Heights 
2) London 
3) United Kindom 

If i try to search the index by putting the following in my query : 
1) RG Heights (Please not R and G do not have space in the middle) or 
2) RGHeights. (no space at all) or   
3) R  G      Heights. (extra space between tokens), 
4) Kingdom United. 

Please tell me if lucene would come up with a positive result or would it
tell me 'no hits'. 

Please let me know this for each of the queries above! 

Thanks!
-- 
View this message in context: http://www.nabble.com/Lucene-search-question-tp21537515p21537515.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene search question

Posted by Anshum <an...@gmail.com>.
Hi Namrata,

As far as results are concerned, it depends on the analyzer you use and the
query formation [was a trivial answer].
About the specific cases:
1. RG Heights : would not under any normal circumstances fetch you any
results unless you index all variations. You could create a custom analyzer
to handle such things though (All single alphabets indexed as single tokens
and combinations, mapping keyword like in synonym analyzers).
2. Same is the case with RGHeights.
3. R  G      Heights.  : Using a whitespace analyzer would handle all such
cases. You could also use multiple analzers withing your custom analyzer to
counter these queries. The other thing is, your application/query formation
logic could strip off or cleanup and map the incoming queries before lucene
recieves them.
4. Kingdom United: I am assuming that you are using a phrase query here, you
could set an appropriate slop value (using ~) in that case while forming the
query, also you could form an 'all words' query i.e. Boolean 'AND' or (MUST
clause in case of lucene). so that your search results include all documents
containing ALL the searched tokens.

Hope this clears it up!

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Mon, Jan 19, 2009 at 11:50 AM, fell <na...@tcs.com> wrote:

>
> Hi all,
>
> I am new to Lucene and I need to know the following:
>
> In case I have indexed some data using Lucene and it contains the fields:
> Location, City, Country.
>
> Suppose the data is as follows in the index in each of the above fields:
> 1) R G Heights
> 2) London
> 3) United Kindom
>
> If i try to search the index by putting the following in my query :
> 1) RG Heights (Please not R and G do not have space in the middle) or
> 2) RGHeights. (no space at all) or
> 3) R  G      Heights. (extra space between tokens),
> 4) Kingdom United.
>
> Please tell me if lucene would come up with a positive result or would it
> tell me 'no hits'.
>
> Please let me know this for each of the queries above!
>
> Thanks!
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-question-tp21537515p21537515.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>