You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Yonik Seeley <yo...@apache.org> on 2006/08/08 20:03:49 UTC

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

On 8/8/06, Laurent Hoss <l....@netbreeze.ch> wrote:
> Suppose having an Index containing Lucene documents, having multiple
> fields (equally) named 'paragraph'.
> Now I want to make a "Field Grouping" query (described in:
> http://lucene.apache.org/java/docs/queryparsersyntax.html )
>  "paragraph:( word1 AND word2 )"
> retrieving only documents where the terms word1,word2 appear in the SAME
> 'paragraph' Field (and not word1 in one 'paragraph' field, and word2
> possibly in another 'paragraph' field in this doc).

You could do this with the current query parser by putting large
position increment gaps between paragraphs that is guaranteed to be
larger than the largest paragraph.  Then you could use a sloppy phrase
query
"word1 word2"~10000 for instance.

That would also score documents higher the closer together the words
appeared (which may or may not be desirable).

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

Posted by Chris Hostetter <ho...@fucit.org>.

: > You could do this with the current query parser by putting large
: > position increment gaps between paragraphs that is guaranteed to be
: > larger than the largest paragraph.  Then you could use a sloppy phrase
: > query
: > "word1 word2"~10000 for instance.

: Unfortunatelly this only makes sense for simple AND/phrase Queries, but
: we also want more complex boolean queries like :
: "nice cat" AND ("mouse" OR "rat" ) AND NOT ("dog")

you should still be able use the slop based approach to solve this by
using Span queries (SpanNear for Phrases and "AND", SpanOr for "OR" and
SpanNot for "NOT") -- but to get the full compliment of all the options
you might want, you'll probably need to insert arbitrary marker tokens in
between each paragraph, and then lump your entire query into a SpanNot
where the exclude clause is a SpanTerm query on your arbitrary paragraph
marker.

this has been discussed in the past, search the Lucene mailing list
archives for "sentence".




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

Posted by Laurent Hoss <l....@netbreeze.ch>.

Yonik Seeley wrote:
> On 8/8/06, Laurent Hoss wrote:
>> Suppose having an Index containing Lucene documents, having multiple
>> fields (equally) named 'paragraph'.
>> Now I want to make a "Field Grouping" query (described in:
>> http://lucene.apache.org/java/docs/queryparsersyntax.html )
>>  "paragraph:( word1 AND word2 )"
>> retrieving only documents where the terms word1,word2 appear in the SAME
>> 'paragraph' Field (and not word1 in one 'paragraph' field, and word2
>> possibly in another 'paragraph' field in this doc).
>
> You could do this with the current query parser by putting large
> position increment gaps between paragraphs that is guaranteed to be
> larger than the largest paragraph.  Then you could use a sloppy phrase
> query
> "word1 word2"~10000 for instance.
>
> That would also score documents higher the closer together the words
> appeared (which may or may not be desirable).
> -Yonik
> http://incubator.apache.org/solr Solr, the open-source Lucene search 
> server
>
We had the same idea ;)
Unfortunatelly this only makes sense for simple AND/phrase Queries, but 
we also want more complex boolean queries like :
"nice cat" AND ("mouse" OR "rat" ) AND NOT ("dog")

Scoring right now doesn't interest us (!) , just whether it is a hit or 
not  (though I must admit it could improve the quality of our results , 
like for browsing the top xx results )
Therefor I would really like to know  if the 'feature' I had in mind , 
is feasible , maybe (only) when scoring is ignored  ?
(Btw, is there an option to switch off scoring ?  Could that improve 
performance in general?)

-Laurent



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

Posted by Chris Hostetter <ho...@fucit.org>.

: That would also score documents higher the closer together the words
: appeared (which may or may not be desirable).

if it's not desirable, it could be "fixed" by overriding the sloppyFreq
method of your Similarity.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org