You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Klaas (JIRA)" <ji...@apache.org> on 2007/03/26 21:13:32 UTC

[jira] Created: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Easily create queries that transform subquery scores arbitrarily
----------------------------------------------------------------

                 Key: LUCENE-850
                 URL: https://issues.apache.org/jira/browse/LUCENE-850
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Search
            Reporter: Mike Klaas


Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.

Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510001 ] 

Mike Klaas commented on LUCENE-850:
-----------------------------------

Tim:  That is typically done by adding an optional implicit phrase query:

john bush -> +(john bush) "john bush"~1000

This works very well for two term queries, but less well when there is more than that.  See also DisjunctionMaxQuery if there are multiple fields

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523979 ] 

Mike Klaas commented on LUCENE-850:
-----------------------------------

Do address the issue above, the following needs to be added:
===================================================================
--- build-src/java/solr/org/apache/lucene/search/CustomBoostQuery.java  (revision 9312)
+++ build-src/java/solr/org/apache/lucene/search/CustomBoostQuery.java  (working copy)
@@ -280,7 +280,7 @@
 
     /*(non-Javadoc) @see org.apache.lucene.search.Scorer#score() */
     public float score() throws IOException {
-      float boostScore = (boostScorer==null ? 1 : boostScorer.score());
+      float boostScore = (boostScorer==null || subQueryScorer.doc() != boostScorer.doc() ? 1 : boos
tScorer.score());
       return qWeight * customScore(subQueryScorer.doc(), subQueryScorer.score(), boostScore);
     }
 
@@ -300,7 +300,8 @@
         return subQueryExpl;
       }
       // match
-      Explanation boostExpl = boostScorer==null ? null : boostScorer.explain(doc);
+      Explanation boostExpl = boostScorer==null ? null : 
+         weight.qStrict ? boostScorer.explain(doc) : weight.boostWeight.explain(reader,doc);
       Explanation customExp = customExplain(doc,subQueryExpl,boostExpl);
       float sc = qWeight * customExp.getValue();
       Explanation res = new ComplexExplanation(


> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: CustomBoostQuery.java, prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Klaas updated LUCENE-850:
------------------------------

    Attachment: prodscorer.patch.diff

Generify the subquery handling logic of DisMax to make it easy to build subquery scorers.

This patch is demonstrative only.  There are no tests, and I'm pretty sure the query norm calculation isn't correct in general.

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Klaas updated LUCENE-850:
------------------------------

    Attachment: CustomBoostQuery.java

Here's an approach I think will work.

Rename CustomScoreQuery to CustomBoostQuery, and remove the ValueSource-specific logic.  Really there is no reason to limit the logic to ValueSource queries: the only important criterion is that we don't expect the docs matches against the boosting query only to be returned (the doc set is unchanged relative to the original query).

I'm not sure what will happen if the boost query doesn't match the document being boosted, however.  Perhaps there should be a default value?

Does this still belong in the function package?

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: CustomBoostQuery.java, prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Tim Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509993 ] 

Tim Sturge commented on LUCENE-850:
-----------------------------------

I just asked for a product scored BooleanQuery on java-users and Mike pointed me in the direction of this bug. My use case is to get the non-phrase query "John Bush" to rank "John Bush" higher than "George Bush" or "John Kerry". I believe this is a common use case (I have 3 or 4 bugs filed against search quality internally that boil down to this issue.)



> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-850:
-------------------------------

         Assignee:     (was: Doron Cohen)
    Lucene Fields: [Patch Available]  (was: [New, Patch Available])

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509998 ] 

Mike Klaas commented on LUCENE-850:
-----------------------------------

Hi Doron,

The main use case is the same as for documents (and to a lesser extent, field-) boosts: the ability to weight a document by a certain amount (rather than adding an additive boost, as adding an additional subclause to the query would entail).

The function query capability works for many situations, as you can store the various types of boosts in a FieldCache and use your approach.  But this doesn't scale when there are tons of possible boost fields (which would usually be sparsely-populated).  SparseFieldCache, anyone?

I decided to move away from ProductQueries for the time being, so that is no longer the main use case of this patch.  Primarily the patch stems from developer frustration of implementing something like ProductQuery.  ISTM that the subquery-handling logic (present in BooleanQuery and slightly different in DisMaxQuery) needn't be so tightly coupled with a choice of scoring function.

For the record, DisMax is actually a ( x*Max + (1-x)*Sum ) Query, so it is both Sum and Max.  Perhaps if we add Prod to the options, there are no more useful subquery combinators?

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513103 ] 

Doron Cohen commented on LUCENE-850:
------------------------------------

> The function query capability works for many situations, as you 
> can store the various types of boosts in a FieldCache and use 
> your approach. But this doesn't scale when there are tons of 
> possible boost fields (which would usually be sparsely-populated). 
> SparseFieldCache, anyone? 

For large collections loading would indeed take long. 
Quoting Michael, payloads will be more efficient for this case. Two options actually:
- faster reading values into a cache
- value-source that feeds on the fly from payloads.


> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-850:
----------------------------------

    Assignee: Doron Cohen

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>         Assigned To: Doron Cohen
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-850) Easily create queries that transform subquery scores arbitrarily

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501620 ] 

Doron Cohen commented on LUCENE-850:
------------------------------------

Mike,

If I understood it correctly your patch can be described as:
- turn DisMaxQuery into a private case of a new generalized "CustomizableOrQuery"
- demostrates this customizability with a new ProductQuery.
- DisMax(OR)Query logic is as before = max =f scob-scores plus tie breaker.
- Product(OR)Query logic is: score = multiplication of scores of sub-scorers.

The regular Bolean Or could probably be phrased this way as Sum(OR)Qurey.

Now in LUCENE-446 I added CustomScoreQuery, which is simpler: 
- score = f (score(q), score(vq))
where 
- f() is overridable, 
- q is any query
- vq is optional, and it is a value-source-query, likely based on (cached) field values.

So it currently doesn't support your comment
   "I've often wanted to multiply the scores of two queries".

When first writing CustomScoreQuery I looked at combining any two or N subqueries, but wasn't sure how to do this. How to normalize. How to calculate the weights. But now I think that we could  perhaps follow your approach closer: call it CustomOrQuery, go for any N subqueries, and define f() accordingly. 

But is this really required / useful?  
What are the use cases for this general/arbiterary combining of scores (beyond current capabilities of o.a.l.search.function)?

Thanks,
Doron

> Easily create queries that transform subquery scores arbitrarily
> ----------------------------------------------------------------
>
>                 Key: LUCENE-850
>                 URL: https://issues.apache.org/jira/browse/LUCENE-850
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Mike Klaas
>            Assignee: Doron Cohen
>         Attachments: prodscorer.patch.diff
>
>
> Refactor DisMaxQuery into SubQuery(Query|Scorer) that admits easy subclassing.  An example is given for multiplicatively combining scores.
> Note: patch is not clean; for demonstration purposes only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org