You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/11/27 18:46:43 UTC

[jira] Issue Comment Edited: (SOLR-418) Editorial Query Boosting Component

    [ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545939 ] 

ryantxu edited comment on SOLR-418 at 11/27/07 9:46 AM:
--------------------------------------------------------------

Here is a first draft that includes recent changes to SOLR-281.  This is incomplete and is posted to get early feedback and advice.

This component loads a file and builds a map of queries to special documents.  The format is:
{code:xml}
<boost>
 <query text="XXXX">
  <doc id="1" priority="1" />
 </query>
 <query text="YYYY">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
 </query>
 <query text="ZZZZ">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
  <doc id="3" priority="5" />
 </query>
</boost>
{code}

for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format: 
 id,priority,phrase
or
 phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.

The search handler is configured with:

{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
    <str name="analyzer">string</str>
    <str name="boosts">boost.xml</str>
  </searchComponent>
 
  <requestHandler name="/boost" class="solr.SearchHandler">
    <arr name="last-components">
      <str>boost</str>
    </arr>
  </requestHandler>
{code}

The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings.  This lets us reuse existing lowercase/trim/pattern/etc filters.

For sorting, I think the best approach is to use a custom sort when sorting by score.  (This isn't implemented yet)

Currently for a matching query, this converts the query using:
{code:java}
      // Build a query to match the forced documents:
      // (id:1 id:2 id:3 id:4 id:5)^0
      BooleanQuery boosted = new BooleanQuery( true );
      for( Booster b : booster ) {
        TermQuery tq = new TermQuery( new Term( idField, b.id ) );
        boosted.add( tq, BooleanClause.Occur.SHOULD );
      }
      boosted.setBoost( 0 ); // don't affect the score
      
      // Change the query to insert forced documents
      BooleanQuery newq = new BooleanQuery( true );
      newq.add( query, BooleanClause.Occur.SHOULD );
      newq.add( boosted, BooleanClause.Occur.SHOULD );
      builder.setQuery( newq );
{code}

For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true

Any feedback would be great!




      was (Author: ryantxu):
    Here is a first draft that includes recent changes to SOLR-281.  This is incomplete and is posted to get early feedback and advice.

This component loads a file and builds a map of queries to special documents.  The format is:
{code:xml}
<boost>
 <query text="XXXX">
  <doc id="1" priority="1" />
 </query>
 <query text="YYYY">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
 </query>
 <query text="ZZZZ">
  <doc id="1" priority="1" />
  <doc id="2" priority="3" />
  <doc id="3" priority="5" />
 </query>
</boost>
{code}

for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format: 
 id,priority,phrase
or
 phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.

The search handler is configured with:

{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
    <str name="analyzer">string</str>
    <str name="boosts">boost.xml</str>
  </searchComponent>
 
  <requestHandler name="/boost" class="solr.SearchHandler">
    <arr name="last-components">
      <str>boost</str>
    </arr>
  </requestHandler>
{/code}

The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings.  This lets us reuse existing lowercase/trim/pattern/etc filters.

For sorting, I think the best approach is to use a custom sort when sorting by score.  (This isn't implemented yet)

Currently for a matching query, this converts the query using:
{code:java}
      // Build a query to match the forced documents:
      // (id:1 id:2 id:3 id:4 id:5)^0
      BooleanQuery boosted = new BooleanQuery( true );
      for( Booster b : booster ) {
        TermQuery tq = new TermQuery( new Term( idField, b.id ) );
        boosted.add( tq, BooleanClause.Occur.SHOULD );
      }
      boosted.setBoost( 0 ); // don't affect the score
      
      // Change the query to insert forced documents
      BooleanQuery newq = new BooleanQuery( true );
      newq.add( query, BooleanClause.Occur.SHOULD );
      newq.add( boosted, BooleanClause.Occur.SHOULD );
      builder.setQuery( newq );
{code}

For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true

Any feedback would be great!



  
> Editorial Query Boosting Component
> ----------------------------------
>
>                 Key: SOLR-418
>                 URL: https://issues.apache.org/jira/browse/SOLR-418
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Ryan McKinley
>             Fix For: 1.3
>
>         Attachments: SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important.  This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.