You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by "Harris, Tobin" <to...@tobinharris.com> on 2006/09/08 15:28:26 UTC

Phrase Matching

Hi Folks,

 

We want to search our lucene index for an exact phrase that is:

 

 “t in the park”

 

Note: the resulting lucene query string is something like:

 

body:(“t in the park”)

 

However, Lucene uses the default stop word list and therefore translates this phrase to simply “park”. This gets a LOT of matches of course :-)

 

Any idea how would I set up Lucene so that we can search for phrases in this way? I’m concerned about removing stop words since it may cause the index to grow huge (we currently add 60,000 items to our index per day). 

 

Any help mucho appreciated.

 

Thanks

 

Tobin

-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
Socena Ltd
Software Development Services
HYPERLINK "http://www.socena.com/"www.socena.com

t: +44 113 2179134           f: +44 870 762 6678
w: HYPERLINK "http://www.tobinharris.com/"www.tobinharris.com    e: HYPERLINK "mailto:tobin@tobinharris.com"HYPERLINK "mailto:tobin@tobinharris.com"tobin@tobinharris.com 

s: tobinharris

35 Kirkstall Avenue, Leeds, LS5 3DW, UK
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.2/441 - Release Date: 07/09/2006

Re: Phrase Matching

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I recommend you use an analyzer during indexing and with QueryParser  
that does not remove stop words and then all will be well with this  
particular query.  As for your index size, don't be too concerned  
with that at the moment.  I suspect it will be under control as long  
as you are careful with what fields you store.

	Erik


On Sep 8, 2006, at 9:28 AM, Harris, Tobin wrote:

> Hi Folks,
>
>
>
> We want to search our lucene index for an exact phrase that is:
>
>
>
>  “t in the park”
>
>
>
> Note: the resulting lucene query string is something like:
>
>
>
> body:(“t in the park”)
>
>
>
> However, Lucene uses the default stop word list and therefore  
> translates this phrase to simply “park”. This gets a LOT of matches  
> of course :-)
>
>
>
> Any idea how would I set up Lucene so that we can search for  
> phrases in this way? I’m concerned about removing stop words since  
> it may cause the index to grow huge (we currently add 60,000 items  
> to our index per day).
>
>
>
> Any help mucho appreciated.
>
>
>
> Thanks
>
>
>
> Tobin
>
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
> Socena Ltd
> Software Development Services
> HYPERLINK "http://www.socena.com/"www.socena.com
>
> t: +44 113 2179134           f: +44 870 762 6678
> w: HYPERLINK "http://www.tobinharris.com/"www.tobinharris.com    e:  
> HYPERLINK "mailto:tobin@tobinharris.com"HYPERLINK  
> "mailto:tobin@tobinharris.com"tobin@tobinharris.com
>
> s: tobinharris
>
> 35 Kirkstall Avenue, Leeds, LS5 3DW, UK
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
>
>
>
>
> -- 
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.405 / Virus Database: 268.12.2/441 - Release Date:  
> 07/09/2006
>