You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Rajesh Munavalli <ra...@dessci.com> on 2005/07/14 17:23:39 UTC

n-gram and multiword query

Consider a document with the following contents
" Levenshtein distance is named after the Russian scientist Vladimir
Levenshtein and is also called edit distance"
 
Possible bi-grams are (after removing the stop words in the beginning
and end)
"Levenshtein distance", "named after", "Russian scientist", "scientist
Vladimir", "Vladimir Levenshtein" called edit", "edit distance"
 
If my query term is "Vladimir levenshtein distance", how does Lucene
compute the similarity to the indexed terms? Are query terms appearing
together given more importance? How does it account for gaps (caused by
stop word removal) while matching multiword query?
 
thanks,
 
Rajesh Munavalli

Re: n-gram and multiword query

Posted by Chen Wei Zhu <mo...@gmail.com>.

i remember lucene doesn't do anything for proximity.

On 7/14/05, Rajesh Munavalli <ra...@dessci.com> wrote:
> Consider a document with the following contents
> " Levenshtein distance is named after the Russian scientist Vladimir
> Levenshtein and is also called edit distance"
> 
> Possible bi-grams are (after removing the stop words in the beginning
> and end)
> "Levenshtein distance", "named after", "Russian scientist", "scientist
> Vladimir", "Vladimir Levenshtein" called edit", "edit distance"
> 
> If my query term is "Vladimir levenshtein distance", how does Lucene
> compute the similarity to the indexed terms? Are query terms appearing
> together given more importance? How does it account for gaps (caused by
> stop word removal) while matching multiword query?
> 
> thanks,
> 
> Rajesh Munavalli
> 
> 


-- 
Thanks!
yours, WeiZhu Chen