You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2006/07/09 11:16:59 UTC

Re: SpanNearQuery with support for missing words

Jonny,

On Saturday 08 July 2006 11:41, John Bonn wrote:
> Hello folks !
>  
> I'm looking for a way to search for a phrase, where in
> I shud get a hit even if some (N) words in the query
> phrase is not matching or even not inorder, i.e. it
> shud match maximum to the query string. I also need to
> get the Spans for the hits for highlighting. 
>  
> SpanNearQuery is quite helpful with inorder set to
> false and for getting the spans.  Is it possible to
> modify SpanNearQuery, so that it wud return the spans
> of the max match string ?? 
>  
> ---> The quick brown fox jumps over the lazy dog 
>  
> Query : quick brainy fox 
>  
> it should give me <quick brown fox>...
>  
> We can prolly (must) have a parameter for N, the
> number of term mismatches allowed. It will surely be
> helpful in search also... 
>  

When modifying NearSpans, I'd start from here:
http://issues.apache.org/jira/browse/LUCENE-569

To have the unordered case match with fewer than all given
subqueries/clauses, have a look at the NearSpansUnordered in the
above issue. It uses a queue with the minimum at the top of the
queue, and it maintains a maximum within that queue.
To match fewer than all given clauses, the queue should have
the size of the number of clauses that you'd like to match.
For the remaining non matching clauses, the same queue cannot
be used, but perhaps another one.

It's better to continue this on java-dev. It's not clear to me how you'd
like to treat the case of no missing words, and relaxing the matching
conditions in NearSpans can (/will) be tricky...

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org