You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/11/22 03:05:43 UTC

[jira] Created: (SOLR-418) Editorial Query Boosting Component

Editorial Query Boosting Component
----------------------------------

                 Key: SOLR-418
                 URL: https://issues.apache.org/jira/browse/SOLR-418
             Project: Solr
          Issue Type: New Feature
          Components: search
            Reporter: Ryan McKinley
             Fix For: 1.3


For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965

Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.

This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Here is an updated patch that implements sorting.  Rather then try to mix boosted and normal results, this uses a custom sort to put the boosted results at the top.  The boost.xml format is now:

{code:xml}
 <query text="ZZZZ">
  <doc id="1" />
  <doc id="2" />
  <doc id="3" />
 </query>
{code}

For the query "ZZZZ" documents 1,2,3 will be the first docs returned followed by anything normally matching "ZZZZ"

If the query specifies a sort, it will be respected.  Only SCORE sorts are modified to boost 
the configured documents.


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554212 ] 

Koji Sekiguchi commented on SOLR-418:
-------------------------------------

I'm interested in this feature and have few comments:

1. I was bit confused "analyzer" in solrconfig.xml. I thought "fieldType" would be straightforward to me.
2. Pardon me if I'm wrong, but does elevationCache need to be synchronized in getElevationMap() as it is called from prepare()?


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548065 ] 

Mike Klaas commented on SOLR-418:
---------------------------------

I think this makes a lot of sense, though I wonder if it might make sense to uniquify queries based on more than the query string.  Certainly the results for a given query would depend greatly on the match-affecting parameters, f.i., fq= of dismax.  This seems part of the "intrinsic query" to me.  Sort does too, but I don't use it much so I'm not sure if my intuition is to be trusted there.

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548054 ] 

Yonik Seeley commented on SOLR-418:
-----------------------------------

It seems like the user should be in control of if these docs are added & sorted first, regardless of what the regular sort is.

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548049 ] 

Ryan McKinley commented on SOLR-418:
------------------------------------

I agree with changing the name from "boosts" to something else...  what is "one box"? (Google points me to their new search appliance ;)

re always putting the 'boosted' docs first...  I'm not *against* making this configurable, but is seems wrong.

If you want to force the sort to have the boosted docs first, isn't that:
{code:xml}
    <lst name="invariants">
      <str name="sort">score</str>
   </lst>
{code}

Is there a real use case to have 'sort=date desc' put the boosted docs first?

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

updated to work with trunk.  added 'forceBoosting="true" argument to force boosting regardless of the requested sort.

Unless we figure out a way to do absolute positionaing, I think this component should be renamed 'DocumentElevationComponent'

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548339 ] 

Yonik Seeley commented on SOLR-418:
-----------------------------------

Is there a way to specify that the file is in the index directory (so it can be replicated out like the rest of the index?)

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Updated patch for trunk.  This also
1. renames the component 'QueryElevationComponent' and uses the term 'elevate' rather then 'boost'

2. Implements 'exclude' function

{code:xml}
 <query text="ipod">
  <doc id="1" />
  <doc id="MA147LL/A" exclude="true" />
 </query>
{code}

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Here is a first draft that includes recent changes to SOLR-281.  This is incomplete and is posted to get early feedback and advice.

This component loads a file and builds a map of queries to special documents.  The format is:
{code:xml}
<boost>
 <query text="XXXX">
  <doc id="1" priority="1" />
 </query>
 <query text="YYYY">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
 </query>
 <query text="ZZZZ">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
  <doc id="3" priority="5" />
 </query>
</boost>
{code}

for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format: 
 id,priority,phrase
or
 phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.

The search handler is configured with:

{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
    <str name="analyzer">string</str>
    <str name="boosts">boost.xml</str>
  </searchComponent>
 
  <requestHandler name="/boost" class="solr.SearchHandler">
    <arr name="last-components">
      <str>boost</str>
    </arr>
  </requestHandler>
{/code}

The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings.  This lets us reuse existing lowercase/trim/pattern/etc filters.

For sorting, I think the best approach is to use a custom sort when sorting by score.  (This isn't implemented yet)

Currently for a matching query, this converts the query using:
{code:java}
      // Build a query to match the forced documents:
      // (id:1 id:2 id:3 id:4 id:5)^0
      BooleanQuery boosted = new BooleanQuery( true );
      for( Booster b : booster ) {
        TermQuery tq = new TermQuery( new Term( idField, b.id ) );
        boosted.add( tq, BooleanClause.Occur.SHOULD );
      }
      boosted.setBoost( 0 ); // don't affect the score
      
      // Change the query to insert forced documents
      BooleanQuery newq = new BooleanQuery( true );
      newq.add( query, BooleanClause.Occur.SHOULD );
      newq.add( boosted, BooleanClause.Occur.SHOULD );
      builder.setQuery( newq );
{code}

For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true

Any feedback would be great!




> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548049 ] 

ryantxu edited comment on SOLR-418 at 12/3/07 3:40 PM:
-------------------------------------------------------------

I agree with changing the name from "boosts" to something else...  what is "one box"? (Google points me to their new search appliance ;)

re always putting the 'boosted' docs first...  I'm not *against* making this configurable, but is seems wrong.

If you want to force the sort to have the boosted docs first, isn't that:
{code:xml}
    <lst name="invariants">
      <str name="sort">score desc</str>
   </lst>
{code}

Is there a real use case to have 'sort=date desc' put the boosted docs first?

      was (Author: ryantxu):
    I agree with changing the name from "boosts" to something else...  what is "one box"? (Google points me to their new search appliance ;)

re always putting the 'boosted' docs first...  I'm not *against* making this configurable, but is seems wrong.

If you want to force the sort to have the boosted docs first, isn't that:
{code:xml}
    <lst name="invariants">
      <str name="sort">score</str>
   </lst>
{code}

Is there a real use case to have 'sort=date desc' put the boosted docs first?
  
> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548347 ] 

Ryan McKinley commented on SOLR-418:
------------------------------------

> 
> Is there a way to specify that the file is in the index directory (so it can be replicated out like the rest of the index?)
> 

Do we do that anywhere else?  Is there / should there be a standard way to do this?  I remember you discussing this elsewhere, but I don't know where.  external value sources?

If you put config files in the index directory, how do you handle the empty new index case?  

You get a FileNotFoundException if you have
 /data/index/boosts.xml without an index in that directory


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560228#action_12560228 ] 

Ryan McKinley commented on SOLR-418:
------------------------------------

Thanks for looking at this - and fixing it up

bq. dropped the seemingly unrelated changes in SolrServlet (part of another patch?)

not sure how that got in there.... it was part of an issue I had with resin loading servlets before filters and SOLR-350 initialization.


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>            Assignee: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549538 ] 

Ryan McKinley commented on SOLR-418:
------------------------------------

I would like to commit most of this patch under SOLR-281.  I will leave out the QueryBoostingComponent stuff and just commit the changes to the component framework that make it possible to configure.

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Thanks Koji -- here is an updated patch

#1 - I change, "analyzer" to "queryFieldType" -- this is the fieldType used to analyze the incoming query.

#2 - I changed it to call {{synchronized( elevationCache )}} when it checks a non-null entry.  It does not need to be synchronized with a null key because in this case, the cache is only built on startup.

To be safe, we could just use:
{code:java}
final Map<IndexReader,Map<String, ElevationObj>> elevationCache = 
    Collections.synchronizedMap( new WeakHashMap<IndexReader, Map<String,ElevationObj>>() );
{code}

but I'm not sure which is better.

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley resolved SOLR-418.
--------------------------------

    Resolution: Fixed
      Assignee: Ryan McKinley

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>            Assignee: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545939 ] 

ryantxu edited comment on SOLR-418 at 11/27/07 9:46 AM:
--------------------------------------------------------------

Here is a first draft that includes recent changes to SOLR-281.  This is incomplete and is posted to get early feedback and advice.

This component loads a file and builds a map of queries to special documents.  The format is:
{code:xml}
<boost>
 <query text="XXXX">
  <doc id="1" priority="1" />
 </query>
 <query text="YYYY">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
 </query>
 <query text="ZZZZ">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
  <doc id="3" priority="5" />
 </query>
</boost>
{code}

for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format: 
 id,priority,phrase
or
 phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.

The search handler is configured with:

{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
    <str name="analyzer">string</str>
    <str name="boosts">boost.xml</str>
  </searchComponent>
 
  <requestHandler name="/boost" class="solr.SearchHandler">
    <arr name="last-components">
      <str>boost</str>
    </arr>
  </requestHandler>
{code}

The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings.  This lets us reuse existing lowercase/trim/pattern/etc filters.

For sorting, I think the best approach is to use a custom sort when sorting by score.  (This isn't implemented yet)

Currently for a matching query, this converts the query using:
{code:java}
      // Build a query to match the forced documents:
      // (id:1 id:2 id:3 id:4 id:5)^0
      BooleanQuery boosted = new BooleanQuery( true );
      for( Booster b : booster ) {
        TermQuery tq = new TermQuery( new Term( idField, b.id ) );
        boosted.add( tq, BooleanClause.Occur.SHOULD );
      }
      boosted.setBoost( 0 ); // don't affect the score
      
      // Change the query to insert forced documents
      BooleanQuery newq = new BooleanQuery( true );
      newq.add( query, BooleanClause.Occur.SHOULD );
      newq.add( boosted, BooleanClause.Occur.SHOULD );
      builder.setQuery( newq );
{code}

For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true

Any feedback would be great!




      was (Author: ryantxu):
    Here is a first draft that includes recent changes to SOLR-281.  This is incomplete and is posted to get early feedback and advice.

This component loads a file and builds a map of queries to special documents.  The format is:
{code:xml}
<boost>
 <query text="XXXX">
  <doc id="1" priority="1" />
 </query>
 <query text="YYYY">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
 </query>
 <query text="ZZZZ">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
  <doc id="3" priority="5" />
 </query>
</boost>
{code}

for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format: 
 id,priority,phrase
or
 phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.

The search handler is configured with:

{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
    <str name="analyzer">string</str>
    <str name="boosts">boost.xml</str>
  </searchComponent>
 
  <requestHandler name="/boost" class="solr.SearchHandler">
    <arr name="last-components">
      <str>boost</str>
    </arr>
  </requestHandler>
{/code}

The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings.  This lets us reuse existing lowercase/trim/pattern/etc filters.

For sorting, I think the best approach is to use a custom sort when sorting by score.  (This isn't implemented yet)

Currently for a matching query, this converts the query using:
{code:java}
      // Build a query to match the forced documents:
      // (id:1 id:2 id:3 id:4 id:5)^0
      BooleanQuery boosted = new BooleanQuery( true );
      for( Booster b : booster ) {
        TermQuery tq = new TermQuery( new Term( idField, b.id ) );
        boosted.add( tq, BooleanClause.Occur.SHOULD );
      }
      boosted.setBoost( 0 ); // don't affect the score
      
      // Change the query to insert forced documents
      BooleanQuery newq = new BooleanQuery( true );
      newq.add( query, BooleanClause.Occur.SHOULD );
      newq.add( boosted, BooleanClause.Occur.SHOULD );
      builder.setQuery( newq );
{code}

For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true

Any feedback would be great!



  
> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-418:
------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Looks good Ryan!
I reviewed, and changed a few minor things (new patch attached)
- fixed a concurrency bug (access of map outside of sync can lead to concurrent modification exception or other errors, even if that key/value pair will never change)
- changed the example example.xml a little, and switched the /elevate handler to load lazily
- updated code/configs to reflect SearchHandler move
- fixed (pre-existing) bugs in code moved to VersionedFile (multiple opens of same file)
- dropped the seemingly  unrelated changes in SolrServlet (part of another patch?)

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548505 ] 

Hoss Man commented on SOLR-418:
-------------------------------

> Is there a way to specify that the file is in the index directory (so it can be replicated 
> out like the rest of the index?)

that definitely seems like a separate issue that we should attempt to solve on the whole for all type of config files down the road ... it also assumes that this component will reread the file on every newSearcher (i haven't read the patch, but i'm assuming it doesn't)

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547633 ] 

Otis Gospodnetic commented on SOLR-418:
---------------------------------------

It seems like even this last bit would be great to make configurable:

"If the query specifies a sort, it will be respected. Only SCORE sorts are modified to boost the configured documents."

In other words, make it possible to force docs in boost.xml to show up in appropriate positions regardless of the sort type.

Also, perhaps references to 'boost(s)' should now be renamed, so there is no confusion?  Isn't the "industry standard" for this type of stuff  "one box"?


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Updated to accept a runtime query param "enableElevation" -- this can disable elevation.

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated SOLR-418:
-------------------------------

    Attachment: SOLR-418-QueryBoosting.patch

Here is an updated patch that allows you to put the configuration in the data directory and have it reload for each IndexReader.

Assuming the component is initalized with:
<str name="config-file">elevate.xml</str>

If elevate.xml exists within the conf directory it will be loaded once at startup.  If it exists within the 'data' directory, it will be reloaded after <commit/>

Check http://wiki.apache.org/solr/QueryElevationComponent for tentative docs.

This also refactored the '''getLatestFile'' logic out of o.a.s.search.function.FileFloatSource and put it in a new class: o.a.s.util.VersionedFile

> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-418) Editorial Query Boosting Component

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548337 ] 

Ryan McKinley commented on SOLR-418:
------------------------------------

To be clear, this respects filter queries.  For:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true&fq=id:2
only id:2 is returned even though 1&3 are boosted.

I suppose we could do something to make the intrinsic query include other fields.  Perhaps

{code:xml}
<boost>
 <query>
  <param name="q">string</param>
  <param name="fq">another</param>
 </query>
 <docs>
  <doc id="1" />
  <doc id="2" />
  <doc id="3" />
 </docs>
</boost>
{code}
or 
{code:xml}
<query params="q=string&fq=another">          
  <doc id="1" />
</query>                    
{code}

*but* I think this gets more complicated then necessary.  For the cases I can think of where you would want different docs boosted, you could just register a different handler with different boosted docs / invariants.  This kind of functionality only really makes sense with dismax style user queries rather then standard lucene query syntax.  That is "dog" rather then "name:dog^3 content:dog^1"

-----

re terminology.  Maybe using the word "boost" will get too confusing.  Perhaps "elevate", "promote", "force top documents"?  

rather then the 'QueryBoostingComponent', this could be the DocumentElevationComponent

{code:xml}
<elevate>
 <query phrase="XXXX">
  <doc id="1"/>
 </query>
 <query text="YYYY">
  <doc id="1" />
  <doc id="2" />
 </query>
</elevate>
{code}
 
The fastsearch glossary has a few terms that may be relevant?

*Absolute boosting*
{panel}
Absolute boosting enables a document to be consistently displayed at a given position in the result set when a user searches with a specific query.  It also prevents individual documents from being displayed when a user searches with a specific query.
{panel}

Under boosting, they have:
{panel}
Boosting may be applied in two ways:
    * Query independent (document boosting). This is used to boost high quality pages for all queries that match the document 
    * Query dependant (query boosting). In this case specific documents may be boosted for given queries
{panel}

Their "Absolute boosting" description makes me wonder if we should add a flag to "burry" or "hide" a document for a given query.  maybe:
{code:xml}
 <doc id="2" hide="true"/>
{code}


> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch, SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.