You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by "Harris, Tobin" <to...@tobinharris.com> on 2006/09/08 15:28:26 UTC
Phrase Matching
Hi Folks,
We want to search our lucene index for an exact phrase that is:
“t in the park”
Note: the resulting lucene query string is something like:
body:(“t in the park”)
However, Lucene uses the default stop word list and therefore translates this phrase to simply “park”. This gets a LOT of matches of course :-)
Any idea how would I set up Lucene so that we can search for phrases in this way? I’m concerned about removing stop words since it may cause the index to grow huge (we currently add 60,000 items to our index per day).
Any help mucho appreciated.
Thanks
Tobin
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
Socena Ltd
Software Development Services
HYPERLINK "http://www.socena.com/"www.socena.com
t: +44 113 2179134 f: +44 870 762 6678
w: HYPERLINK "http://www.tobinharris.com/"www.tobinharris.com e: HYPERLINK "mailto:tobin@tobinharris.com"HYPERLINK "mailto:tobin@tobinharris.com"tobin@tobinharris.com
s: tobinharris
35 Kirkstall Avenue, Leeds, LS5 3DW, UK
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.2/441 - Release Date: 07/09/2006
Re: Phrase Matching
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I recommend you use an analyzer during indexing and with QueryParser
that does not remove stop words and then all will be well with this
particular query. As for your index size, don't be too concerned
with that at the moment. I suspect it will be under control as long
as you are careful with what fields you store.
Erik
On Sep 8, 2006, at 9:28 AM, Harris, Tobin wrote:
> Hi Folks,
>
>
>
> We want to search our lucene index for an exact phrase that is:
>
>
>
> “t in the park”
>
>
>
> Note: the resulting lucene query string is something like:
>
>
>
> body:(“t in the park”)
>
>
>
> However, Lucene uses the default stop word list and therefore
> translates this phrase to simply “park”. This gets a LOT of matches
> of course :-)
>
>
>
> Any idea how would I set up Lucene so that we can search for
> phrases in this way? I’m concerned about removing stop words since
> it may cause the index to grow huge (we currently add 60,000 items
> to our index per day).
>
>
>
> Any help mucho appreciated.
>
>
>
> Thanks
>
>
>
> Tobin
>
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
> Socena Ltd
> Software Development Services
> HYPERLINK "http://www.socena.com/"www.socena.com
>
> t: +44 113 2179134 f: +44 870 762 6678
> w: HYPERLINK "http://www.tobinharris.com/"www.tobinharris.com e:
> HYPERLINK "mailto:tobin@tobinharris.com"HYPERLINK
> "mailto:tobin@tobinharris.com"tobin@tobinharris.com
>
> s: tobinharris
>
> 35 Kirkstall Avenue, Leeds, LS5 3DW, UK
> -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~-
>
>
>
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.405 / Virus Database: 268.12.2/441 - Release Date:
> 07/09/2006
>