You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by asuka <an...@gmail.com> on 2013/09/23 13:57:56 UTC

Get only those documents that are fully satisfied.

Hi,
   I've got an scenario similar to the following one.

<doc>
                <field name="ID">ID1</field>
                <field name="ARTIST">PAUL MCCARTNEY</field>
                <field name="NAME">FLOWERS IN THE DIRT</field>
                <field name="YEAR">1989</field>
</doc>
<doc>
                <field name="ID">LP2</field>
                <field name="ARTIST">ALICE IN CHAINS</field>
                <field name="NAME">DIRT</field>
                <field name="YEAR">1992</field>
</doc>
<doc>
                <field name="ID">LP3</field>
                <field name="ARTIST">GUNS'N'ROSES</field>
                <field name="NAME">THE SPAGHETTI INCIDENT?</field>
                <field name="YEAR">1993</field>
</doc>

I can't picture how I can perform searches that give me, as a result, those
LP's  where all their NAME terms have been satisfied; for example, imagine I
search for "DIRT"; I would like to get only the doc with ID LP2, since at
LP1, we've got the words "FLOWERS IN THE" that haven't been included at the
query.

If the query is: "the dirt spaguetti incident?" I should get the docs LP2
and LP3.

Kind regards





--
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get only those documents that are fully satisfied.

Posted by asuka <an...@gmail.com>.
Thanks Chris,
    that's exactly what I was looking for.

One last question. As far as I can see, the solution that you are offering
me, termfreq is for Solr 4+, isn't it?

Right now I'm working with Solr 3.6.2. Is there any solution for such
version or do I need an upgrade?

Kind Regards



--
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get only those documents that are fully satisfied.

Posted by Chris Hostetter <ho...@fucit.org>.
: Your requirement is still somewhat ambiguous - you use "fully" and "some" in
: the same sentence. Which is it?

the request seems pretty clear to me...

:   I don't want to get documents that fit my whole query, I want those
: documents that are fully satisfied  with some terms of the query.

...my reading is:

 * given a set of documents each containing an arbitrary number of 
"doc_terms" in "field_f"
 * given a query "q" containing an arbitrary number of "q_terms"
 * find all documents where every "doc_term" in that document's "field_f" 
exists in the query as a "q_term"

ie: all terms of the document must exist in the query for the doc to 
match, but not all terms from the query must exist in a document.

There is no trivial out of the box solution at the moment, but there is a 
solution possible using function queries as described in 
this email...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201308.mbox/%3Calpine.DEB.2.02.1308091122150.2685@frisbee%3E

Repeating the key bits below...

-Hoss


	...

1) if you don't care about using non-trivial analysis (ie: you don't need 
stemming, or synonyms, etc..), you can do this with some really simple 
function queries -- asusming you index a field containing hte number of 
"words" in each document, in addition to the words themselves.  Assuming 
your words are in a field named "words" and the number of words is in a 
field named "words_count" a request for something like "Galaxy Samsung S4" 
can be represented as...

  q={!frange l=0 u=0}sub(words_count,
                         sum(termfreq('words','Galaxy'),
                             termfreq('words','Samsung'),
                             termfreq('words','S4'))

...ie: you want to compute the sub of the term frequencies for each of 
hte words requested, and then you want ot subtract that sum from the 
number of terms in the documengt -- and then you only want ot match 
documents where the result of that subtraction is 0.

one complexity that comes up, is that you haven't specified:
  
  * can the list of words in your documents contain duplicates?
  * can the list of words in your query contain duplicates?
  * should a document with duplicatewords match only if the query also 
contains the same word duplicated?

...the answers to those questions make hte math more complicated (and are 
left as an excersize for the reader)


2) if you *do* care about using non-trivial analysis, then you can't use 
the simple "termfreq()" function, which deals with raw terms -- in stead 
you have to use the "query()" function to ensure that the input is parsed 
appropriately -- but then you have to wrap that function in something that 
will normalize the scores - so in place of termfreq('words','Galaxy') 
you'd want something like...

            if(query({!field f=words v='Galaxy'}),1,0)

...but again the math gets much harder if you make things more complex 
with duplicate words i nthe document or duplicate words in the query -- 
you'd probably have to use a custom similarity to get the scores returned 
by the query() function to be usable as is in the match equation (and drop 
the "if()" function)

Re: Get only those documents that are fully satisfied.

Posted by Jack Krupansky <ja...@basetechnology.com>.
Your requirement is still somewhat ambiguous - you use "fully" and "some" in 
the same sentence. Which is it?

If you simply want documents that contain every one of the query terms, 
using the explicit AND operator ("+" or "AND") or set the implicit operator 
to "AND".

But... we are still in the dark as to your precise requirement.

-- Jack Krupansky

-----Original Message----- 
From: asuka
Sent: Tuesday, September 24, 2013 11:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Get only those documents that are fully satisfied.

Hi Andre,
   I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote
> (Your schema and query only appear on the nabble.com forum, it is mostly
> empty for me on the mailing list)
>
> What you want is probable to change OR to AND :
>
> params.set("q.op", "AND");





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Get only those documents that are fully satisfied.

Posted by asuka <an...@gmail.com>.
Hi Andre,
   I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote
> (Your schema and query only appear on the nabble.com forum, it is mostly
> empty for me on the mailing list)
> 
> What you want is probable to change OR to AND :
> 
> params.set("q.op", "AND");





--
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get only those documents that are fully satisfied.

Posted by Andre Bois-Crettez <an...@kelkoo.com>.
(Your schema and query only appear on the nabble.com forum, it is mostly
empty for me on the mailing list)

What you want is probable to change OR to AND :

params.set("q.op", "AND");


André
On 09/23/2013 04:44 PM, asuka wrote:
> Hi Jack,
>     I've been working with the following schema field analyzer:
>
>
>
> Regarding the query, the one I'm using right now is:
>
>
>
> But with this query, I just get results by the presence of any of the words
> at the sentence.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091565.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
> André Bois-Crettez
>
> Software Architect
> Search Developer
> http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: Get only those documents that are fully satisfied.

Posted by asuka <an...@gmail.com>.
Hi Jack,
   I've been working with the following schema field analyzer:



Regarding the query, the one I'm using right now is:



But with this query, I just get results by the presence of any of the words
at the sentence.



--
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get only those documents that are fully satisfied.

Posted by Jack Krupansky <ja...@basetechnology.com>.
It all depends on your query parameters and schema field type analyzers, of 
which you have told us nothing.

-- Jack Krupansky

-----Original Message----- 
From: asuka
Sent: Monday, September 23, 2013 7:57 AM
To: solr-user@lucene.apache.org
Subject: Get only those documents that are fully satisfied.

Hi,
   I've got an scenario similar to the following one.

<doc>
                <field name="ID">ID1</field>
                <field name="ARTIST">PAUL MCCARTNEY</field>
                <field name="NAME">FLOWERS IN THE DIRT</field>
                <field name="YEAR">1989</field>
</doc>
<doc>
                <field name="ID">LP2</field>
                <field name="ARTIST">ALICE IN CHAINS</field>
                <field name="NAME">DIRT</field>
                <field name="YEAR">1992</field>
</doc>
<doc>
                <field name="ID">LP3</field>
                <field name="ARTIST">GUNS'N'ROSES</field>
                <field name="NAME">THE SPAGHETTI INCIDENT?</field>
                <field name="YEAR">1993</field>
</doc>

I can't picture how I can perform searches that give me, as a result, those
LP's  where all their NAME terms have been satisfied; for example, imagine I
search for "DIRT"; I would like to get only the doc with ID LP2, since at
LP1, we've got the words "FLOWERS IN THE" that haven't been included at the
query.

If the query is: "the dirt spaguetti incident?" I should get the docs LP2
and LP3.

Kind regards





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531.html
Sent from the Solr - User mailing list archive at Nabble.com.