You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2007/11/27 18:46:43 UTC
[jira] Issue Comment Edited: (SOLR-418) Editorial Query Boosting
Component
[ https://issues.apache.org/jira/browse/SOLR-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545939 ]
ryantxu edited comment on SOLR-418 at 11/27/07 9:46 AM:
--------------------------------------------------------------
Here is a first draft that includes recent changes to SOLR-281. This is incomplete and is posted to get early feedback and advice.
This component loads a file and builds a map of queries to special documents. The format is:
{code:xml}
<boost>
<query text="XXXX">
<doc id="1" priority="1" />
</query>
<query text="YYYY">
<doc id="1" priority="1" />
<doc id="2" priority="3" />
</query>
<query text="ZZZZ">
<doc id="1" priority="1" />
<doc id="2" priority="3" />
<doc id="3" priority="5" />
</query>
</boost>
{code}
for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format:
id,priority,phrase
or
phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.
The search handler is configured with:
{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
<str name="analyzer">string</str>
<str name="boosts">boost.xml</str>
</searchComponent>
<requestHandler name="/boost" class="solr.SearchHandler">
<arr name="last-components">
<str>boost</str>
</arr>
</requestHandler>
{code}
The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings. This lets us reuse existing lowercase/trim/pattern/etc filters.
For sorting, I think the best approach is to use a custom sort when sorting by score. (This isn't implemented yet)
Currently for a matching query, this converts the query using:
{code:java}
// Build a query to match the forced documents:
// (id:1 id:2 id:3 id:4 id:5)^0
BooleanQuery boosted = new BooleanQuery( true );
for( Booster b : booster ) {
TermQuery tq = new TermQuery( new Term( idField, b.id ) );
boosted.add( tq, BooleanClause.Occur.SHOULD );
}
boosted.setBoost( 0 ); // don't affect the score
// Change the query to insert forced documents
BooleanQuery newq = new BooleanQuery( true );
newq.add( query, BooleanClause.Occur.SHOULD );
newq.add( boosted, BooleanClause.Occur.SHOULD );
builder.setQuery( newq );
{code}
For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true
Any feedback would be great!
was (Author: ryantxu):
Here is a first draft that includes recent changes to SOLR-281. This is incomplete and is posted to get early feedback and advice.
This component loads a file and builds a map of queries to special documents. The format is:
{code:xml}
<boost>
<query text="XXXX">
<doc id="1" priority="1" />
</query>
<query text="YYYY">
<doc id="1" priority="1" />
<doc id="2" priority="3" />
</query>
<query text="ZZZZ">
<doc id="1" priority="1" />
<doc id="2" priority="3" />
<doc id="3" priority="5" />
</query>
</boost>
{code}
for the query "YYYY", document 1 should be in position 1, document 2 in position 3.
I considered a .csv style format:
id,priority,phrase
or
phrase,[id,priority]+
but I think the XML equivalent will be easier to edit/maintain.
The search handler is configured with:
{code:xml}
<searchComponent name="boost" class="org.apache.solr.handler.component.QueryBoostingComponent" >
<str name="analyzer">string</str>
<str name="boosts">boost.xml</str>
</searchComponent>
<requestHandler name="/boost" class="solr.SearchHandler">
<arr name="last-components">
<str>boost</str>
</arr>
</requestHandler>
{/code}
The <str name="analyzer">string</str> bit chooses a fieldType (from schema.xml) and uses that to normalize input strings. This lets us reuse existing lowercase/trim/pattern/etc filters.
For sorting, I think the best approach is to use a custom sort when sorting by score. (This isn't implemented yet)
Currently for a matching query, this converts the query using:
{code:java}
// Build a query to match the forced documents:
// (id:1 id:2 id:3 id:4 id:5)^0
BooleanQuery boosted = new BooleanQuery( true );
for( Booster b : booster ) {
TermQuery tq = new TermQuery( new Term( idField, b.id ) );
boosted.add( tq, BooleanClause.Occur.SHOULD );
}
boosted.setBoost( 0 ); // don't affect the score
// Change the query to insert forced documents
BooleanQuery newq = new BooleanQuery( true );
newq.add( query, BooleanClause.Occur.SHOULD );
newq.add( boosted, BooleanClause.Occur.SHOULD );
builder.setQuery( newq );
{code}
For debugging, check:
http://localhost:8983/solr/boost?q=ZZZZ&debugQuery=true
Any feedback would be great!
> Editorial Query Boosting Component
> ----------------------------------
>
> Key: SOLR-418
> URL: https://issues.apache.org/jira/browse/SOLR-418
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Ryan McKinley
> Fix For: 1.3
>
> Attachments: SOLR-418-QueryBoosting.patch
>
>
> For a given query string, a human editor can say what documents should be important. This is related to a lucene discussion:
> http://www.nabble.com/Forced-Top-Document-tf4682070.html#a13408965
> Ideally, the position could be determined explicitly by the editor - otherwise increasing the boost is probably sufficient.
> This patch uses the Search Component framework to inject custom document boosting into the standard SearchHandler.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.