You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Teruhiko Kurosaka <Ku...@basistech.com> on 2008/11/14 17:40:17 UTC

Phrase query-like query that doesn't requre all the terms?

PhraseQuery requires all the terms in the phrase
exists in the field being searched.  I am looking
for a more permissible version of PhraseQuery which
is sensitive to the order of the terms but
allows missing terms, which would lower the score
but still matches.

For example, query "DEF GHI" would match with
"DEF GHI"
"ABC DEF GHI JKL"
"XYZ DEF GHI"
"DEF GHI XYZ"
with relatively high score, but it would also
match with:
"DEF"
"GHI"
"DEF XYZ"
etc. with lower scores.
This would NOT match (or severely penalizes the score) 
with:
"GHI DEF" (out of order)
"ABC XYZ" (no terms)

Is there any such Query class or other way to achieve
the similar effect? 
----
T. "Kuro" Kurosaka, Basis Technology
San Francisco, California, U.S.A.
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase query-like query that doesn't requre all the terms?

Posted by Yonik Seeley <yo...@apache.org>.
On Fri, Nov 14, 2008 at 12:05 PM, Teruhiko Kurosaka <Ku...@basistech.com> wrote:
> My problem with Phrase Query is that it requires
> existence of all the terms in documents.  I want them more
> permissible.  I want it to match with lower score.
> Does dismax also requires all the terms?

The mandatory part +(DEF GHI) selects documents with either term and
scores higher if both terms are present.  The sloppy phrase query part
"DEF GHI"~10^5 is optional and only contributes to the score when both
terms appear near each other (and scores higher the closer together
they are).

-Yonik


>> Solr's dismax parser can generate queries that do most of
>> this... it's a combination of term queries and sloppy phrase queries.
>>
>> Simplest example:
>> +(DEF GHI) "DEF GHI"~10^5
>>
>> The only thing that it doesn't work for is the terms out of
>> order (they will still be matched).  You could use span
>> queries if you really need that ordering, but sloppy phrase
>> queries already penalize the out-of-order since it's a bigger
>> edit distance (but it won't be a "severe" penalty).
>
> ----
> T. "Kuro" Kurosaka, Basis Technology
> San Francisco, California, U.S.A.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Phrase query-like query that doesn't requre all the terms?

Posted by Teruhiko Kurosaka <Ku...@basistech.com>.
Yonik, 
Thank you for your reply.
My problem with Phrase Query is that it requires
existence of all the terms in documents.  I want them more
permissible.  I want it to match with lower score.
Does dismax also requires all the terms?

> Solr's dismax parser can generate queries that do most of 
> this... it's a combination of term queries and sloppy phrase queries.
> 
> Simplest example:
> +(DEF GHI) "DEF GHI"~10^5
> 
> The only thing that it doesn't work for is the terms out of 
> order (they will still be matched).  You could use span 
> queries if you really need that ordering, but sloppy phrase 
> queries already penalize the out-of-order since it's a bigger 
> edit distance (but it won't be a "severe" penalty).

----
T. "Kuro" Kurosaka, Basis Technology
San Francisco, California, U.S.A.
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase query-like query that doesn't requre all the terms?

Posted by Yonik Seeley <yo...@apache.org>.
Solr's dismax parser can generate queries that do most of this... it's
a combination of term queries and sloppy phrase queries.

Simplest example:
+(DEF GHI) "DEF GHI"~10^5

The only thing that it doesn't work for is the terms out of order
(they will still be matched).  You could use span queries if you
really need that ordering, but sloppy phrase queries already penalize
the out-of-order since it's a bigger edit distance (but it won't be a
"severe" penalty).

-Yonik

On Fri, Nov 14, 2008 at 11:40 AM, Teruhiko Kurosaka <Ku...@basistech.com> wrote:
> PhraseQuery requires all the terms in the phrase
> exists in the field being searched.  I am looking
> for a more permissible version of PhraseQuery which
> is sensitive to the order of the terms but
> allows missing terms, which would lower the score
> but still matches.
>
> For example, query "DEF GHI" would match with
> "DEF GHI"
> "ABC DEF GHI JKL"
> "XYZ DEF GHI"
> "DEF GHI XYZ"
> with relatively high score, but it would also
> match with:
> "DEF"
> "GHI"
> "DEF XYZ"
> etc. with lower scores.
> This would NOT match (or severely penalizes the score)
> with:
> "GHI DEF" (out of order)
> "ABC XYZ" (no terms)
>
> Is there any such Query class or other way to achieve
> the similar effect?
> ----
> T. "Kuro" Kurosaka, Basis Technology
> San Francisco, California, U.S.A.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org