You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jim Hargrave <Ha...@ldschurch.org> on 2004/01/10 01:03:29 UTC

Re: Are deleted words allowed in a sloppy phrase query?

Thanks Eric, Sorry about the personal post. Groupwise must not be posting as it should - I see it locally but must not have gone out to the mailing list. 
 
>From your description I may have no choice but to hack a custom version of Lucene. I do think that a "string edit distance" version of PhraseQuery would be benificial. If you break your words into character ngrams it would allow you to search languages which have no easy stemming algorithms or word boundries (like Thai, Cambodian, Laotion etc..). There are some ngram based IR systems out there that show this works pretty good for English at least. Since we are only interested in key word matching it does a fair job for the languages we have tried.
 
If anybody else has an idea that would allow me to modify PhraseQuery to do a full "String edit distance" search I would appreciate it. 
 
Jim Hargrave

>>> "Erik Hatcher" <er...@ehatchersolutions.com> 01/08/04 01:43PM >>>
On Jan 7, 2004, at 3:54 PM, Jim Hargrave wrote:
> Looks like I will have to implement my own PhraseQuery that uses a 
> standard string edit distance measure. What is the easiest way to do 
> this? Should I override PhraseQuery - then override the 
> SloppyPhraseScorer? I have my own query parser so I can make any 
> adjustments needed when building aquery.

Probably best to keep this on the lucene-user e-mail list, but it is 
non-trivial to implement a custom Query.   While PhraseQuery itself can 
be extended, there are several pieces it uses which are currently 
scoped at package visibility level only.

Even if you are using the built-in QueryParser, you can override the 
method that constructs the PhraseQuery.

>  BTW: We have implemented a multilingual key word in context 
> application that provides exact, stemmed and fuzzy search for ANY 
> language. Well we will have fuzzy search when I finish these 
> modifications. Lucene rules!
>

Nice!

    Erik