You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Ginzburg <da...@digitaltrowel.com> on 2009/11/08 15:06:03 UTC

synonym payload boosting

Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws IOException *
*        . *
*        . *
*        .*
       * Payload boostPayload;*
*
*
*        for (Synonym synonym : syns) {*
*            *
*            Token newTok = new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
*            newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
*            // set the position increment to zero*
*            // this tells lucene the synonym is*
*            // in the exact same location as the originating word*
*            newTok.setPositionIncrement(0);*
*            boostPayload = new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
*            newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field definition:

*
<fieldType name="PersonName" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="com.digitaltrowel.solr.DTSynonymFactory"
FreskoFunction="names_with_scoresPipe23Columns.txt" ignoreCase="true"
expand="false"/>

        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
        <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--<filter class="com.digitaltrowel.solr.DTSynonymFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>-->
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
        <!--<filter class="solr.RemoveDuplicatesTokenFilterFactory"/    >-->
      </analyzer>
    </fieldType>


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity {


    public BoostingSymilarity(){
        super();

  }
    @Override
    public  float scorePayload(String field, byte [] payload, int offset,
int length)
{
 double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
 }

@Override public float coord(int overlap, int maxoverlap)
 {
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
 return 1.0f;
}

@Override public float lengthNorm(String fieldName, int numTerms)
 {
return 1.0f;
}

@Override public float tf(float freq)
{
 return 1.0f;
}
}

My problem is that scorePayload method does not get called at search time
like the other methods in  my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in solr 1.4.


*

Re: synonym payload boosting

Posted by David Ginzburg <da...@digitaltrowel.com>.
Hi,
I have succeeded running and querying with *PayloadTermQueryPlugin, *
*When I ran my test against an embedded solrj server it ran fine, Im using
maven solr 1.4 artifacts.*
*When I deployed it into my servlet container the plugin didn't load, the
war in the servlet container came from a standard Solr 1.4 downloaded tar.gz
*
*When I changed the jar in the WEB-INF/lib folder to the jars from the maven
repository, the plugin loaded.*
I don't know if a bug on Jira should be opened for this, but the
distributions should be updated
*
*
On Wed, Nov 11, 2009 at 18:15, David Ginzburg <da...@digitaltrowel.com>wrote:

> Hi,
> I have added a PayloadTermQueryPlugin after reading
> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>
> my class is :
>  */
> *import org.apache.solr.common.params.SolrParams;*
> *import org.apache.solr.common.util.NamedList;*
> *import org.apache.solr.common.SolrException;*
> *import org.apache.solr.request.SolrQueryRequest;*
> *import org.apache.lucene.search.Query;*
> *import org.apache.lucene.search.payloads.*;*
> *import org.apache.lucene.queryParser.ParseException;*
> *import org.apache.lucene.index.Term;*
> *import org.apache.solr.search.QParser;*
> *import org.apache.solr.search.QParserPlugin;*
> *import org.apache.solr.search.QueryParsing;*
> *
> *
> *public class PayloadTermQueryPlugin extends QParserPlugin {*
> *    private MinPayloadFunction payloadFunc;*
> *    @Override*
> *  public void init(NamedList args) {*
> *      this.payloadFunc=new MinPayloadFunction();*
> *  }*
> *
> *
> *  @Override*
> *  public QParser createParser(String qstr, SolrParams localParams,
> SolrParams params, SolrQueryRequest req) {*
> *    return new QParser(qstr, localParams, params, req) {*
> *            @Override*
> *      public Query parse() throws ParseException {*
> *
> *
> *                 Term term = new Term(localParams.get(QueryParsing.F),
> localParams.get(QueryParsing.V));*
> *                  return new PayloadTermQuery(term,payloadFunc, false);*
> *      }*
> *    };*
> *  }*
>
>
> I tested it using Solrj
>
> * @Override*
> *    protected void setUp() throws Exception {*
> *        super.setUp();*
> *        System.setProperty("solr.solr.home", "C:\\temp\\solr_home1.4");*
> *        CoreContainer.Initializer initializer = new
> CoreContainer.Initializer();*
> *
> *
> *        try {*
> *            coreContainer = initializer.initialize();*
> *        } catch (IOException ex) {*
> *
>  Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
> null, ex);*
> *        } catch (ParserConfigurationException ex) {*
> *
>  Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
> null, ex);*
> *        } catch (SAXException ex) {*
> *
>  Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
> null, ex);*
> *        }*
> *        server = new EmbeddedSolrServer(coreContainer, "");*
> *    }*
> *    *
> *    public void testSeacrhAndBoost() {*
> *        SolrQuery query = new SolrQuery();*
> *        query.setQuery("PFirstName:steve");*
> *query.setParam("hl.fl", "PFirstName");*
> * query.setParam("defType", "payload");*
> *        query.setIncludeScore(true);*
> *
> *
> *        query.setRows(10);*
> *        query.setFacet(false);*
> *
> *
> *        try {*
> *            QueryResponse qr = server.query(query);*
> *            *
> *            List<PersonDoc> l = qr.getBeans(PersonDoc.class);*
> *            for (PersonDoc personDoc : l) {*
> *                System.out.println(personDoc);*
> *            }*
> *
> *
> *        } catch (SolrServerException ex) {*
> *
>  Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
> null, ex);*
> *
> *
> *        }*
> *    }*
> *}*
>
>
> I get an NPE trying to access  localParams in the *public QParser
> createParser(String qstr, SolrParams localParams, SolrParams params,
> SolrQueryRequest req)*  method
> The NPE is actually in the *public Query parse() throws ParseException*method
>
> I could not find documentation about the parse method, How can I pass
> the localParams?
> What is the difference between the localParams and params?
>
>
> I would be happy to write the a case study on the wiki but, I'm not sure
> exactly what you mean- The resolution i will eventually come to or the
> process of finding it?
> I'm still trying to figure out what exactly to do.  I have purchased the
> Solr 1.4 book , but it doesn't seem to have much information about my needs.
>
> On Tue, Nov 10, 2009 at 10:09, David Ginzburg <da...@digitaltrowel.com>wrote:
>
>> I would be happy to.
>> I'm not sure exactly what you mean- The resolution i will eventually come
>> to or the process of finding it?
>> I'm still trying to figure out what exactly to do.  I have purchased the
>> Solr 1.4 book , but it doesn't seem to have much information about my needs.
>>
>>
>> ---------- Forwarded message ----------
>> From: Lance Norskog <go...@gmail.com>
>> Date: Tue, Nov 10, 2009 at 04:11
>> Subject: Re: synonym payload boosting
>> To: solr-user@lucene.apache.org
>>
>>
>> David, when you get this working would you consider writing a case
>> study on the wiki? Nothing complex, just something that describes how
>> you did several customizations to create a new feature.
>>
>> On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll <gs...@apache.org>
>> wrote:
>> >
>> > On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
>> >
>> >> I have found this
>> >>
>> >>
>> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> patch
>> >> But i don't want to use any function, just the normal scoring and the
>> >> similarity class  I have written.
>> >> Can you point me to  modifications I need (if any) ?
>> >>
>> >>
>> >
>> > Amhet's point is that you need some query that will actually invoke the
>> > payload in scoring.  PayloadTermQuery and PayloadNearQuery are the two
>> that
>> > do this in Lucene.  You can certainly write your own, as well.
>> >
>> > -Grant
>> >
>> >>
>> >> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <io...@yahoo.com> wrote:
>> >>
>> >>> Additionaly you need to modify your queryparser to return
>> >>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>> >>>
>> >>> With these types of Queries scorePayload method invoked.
>> >>>
>> >>> Hope this helps.
>> >>>
>> >>> --- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:
>> >>>
>> >>>> From: David Ginzburg <da...@digitaltrowel.com>
>> >>>> Subject: synonym payload boosting
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Date: Sunday, November 8, 2009, 4:06 PM
>> >>>> Hi,
>> >>>> I have a field and a wighted synonym map.
>> >>>> I have indexed the synonyms with the weight as payload.
>> >>>> my code snippet from my filter
>> >>>>
>> >>>> *public Token next(final Token reusableToken) throws
>> >>>> IOException *
>> >>>> *        . *
>> >>>> *        . *
>> >>>> *        .*
>> >>>>      * Payload boostPayload;*
>> >>>> *
>> >>>> *
>> >>>> *        for (Synonym synonym : syns)
>> >>>> {*
>> >>>> *            *
>> >>>> *            Token newTok =
>> >>>> new Token(nToken.startOffset(),
>> >>>> nToken.endOffset(), "SYNONYM");*
>> >>>> *
>> >>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
>> >>>> synonym.getToken().length());*
>> >>>> *            // set the
>> >>>> position increment to zero*
>> >>>> *            // this tells
>> >>>> lucene the synonym is*
>> >>>> *            // in the exact
>> >>>> same location as the originating word*
>> >>>> *
>> >>>> newTok.setPositionIncrement(0);*
>> >>>> *            boostPayload =
>> >>>> new
>> >>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
>> >>>> *
>> >>>> newTok.setPayload(boostPayload);*
>> >>>> *
>> >>>> *
>> >>>> I have put it in the index time analyzer : this is my field
>> >>>> definition:
>> >>>>
>> >>>> *
>> >>>> <fieldType name="PersonName" class="solr.TextField"
>> >>>> positionIncrementGap="100" >
>> >>>>     <analyzer type="index">
>> >>>>       <tokenizer
>> >>>> class="solr.WhitespaceTokenizerFactory"/>
>> >>>>       <filter
>> >>>> class="solr.StopFilterFactory" ignoreCase="true"
>> >>>> words="stopwords.txt"/>
>> >>>>       <filter
>> >>>> class="solr.LowerCaseFilterFactory"/>
>> >>>>       <filter
>> >>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>> >>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
>> >>>> ignoreCase="true"
>> >>>> expand="false"/>
>> >>>>
>> >>>>       <!--<filter
>> >>>> class="solr.EnglishPorterFilterFactory"
>> >>>> protected="protwords.txt"/>-->
>> >>>>       <!--<filter
>> >>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>> >>>>     </analyzer>
>> >>>>     <analyzer type="query">
>> >>>>       <tokenizer
>> >>>> class="solr.WhitespaceTokenizerFactory"/>
>> >>>>       <filter
>> >>>> class="solr.LowerCaseFilterFactory"/>
>> >>>>       <!--<filter
>> >>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>> >>>> synonyms="synonyms.txt" ignoreCase="true"
>> >>>> expand="false"/>-->
>> >>>>       <filter
>> >>>> class="solr.StopFilterFactory" ignoreCase="true"
>> >>>> words="stopwords.txt"/>
>> >>>>       <!--<filter
>> >>>> class="solr.EnglishPorterFilterFactory"
>> >>>> protected="protwords.txt"/>-->
>> >>>>       <!--<filter
>> >>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
>> >>>>>
>> >>>>> -->
>> >>>>
>> >>>>     </analyzer>
>> >>>>   </fieldType>
>> >>>>
>> >>>>
>> >>>> my similarity class is
>> >>>> public class BoostingSymilarity extends DefaultSimilarity
>> >>>> {
>> >>>>
>> >>>>
>> >>>>   public BoostingSymilarity(){
>> >>>>       super();
>> >>>>
>> >>>>  }
>> >>>>   @Override
>> >>>>   public  float scorePayload(String field,
>> >>>> byte [] payload, int offset,
>> >>>> int length)
>> >>>> {
>> >>>> double weight = PayloadHelper.decodeFloat(payload, 0);
>> >>>> return (float)weight;
>> >>>> }
>> >>>>
>> >>>> @Override public float coord(int overlap, int maxoverlap)
>> >>>> {
>> >>>> return 1.0f;
>> >>>> }
>> >>>>
>> >>>> @Override public float idf(int docFreq, int numDocs)
>> >>>> {
>> >>>> return 1.0f;
>> >>>> }
>> >>>>
>> >>>> @Override public float lengthNorm(String fieldName, int
>> >>>> numTerms)
>> >>>> {
>> >>>> return 1.0f;
>> >>>> }
>> >>>>
>> >>>> @Override public float tf(float freq)
>> >>>> {
>> >>>> return 1.0f;
>> >>>> }
>> >>>> }
>> >>>>
>> >>>> My problem is that scorePayload method does not get called
>> >>>> at search time
>> >>>> like the other methods in  my similarity class.
>> >>>> I tested and verified it with break points.
>> >>>> What am I doing wrong?
>> >>>> I used solr 1.3 and thinking of the payload boos support in
>> >>>> solr 1.4.
>> >>>>
>> >>>>
>> >>>> *
>> >>>>
>> >>>
>> >>> __________________________________________________
>> >>> Do You Yahoo!?
>> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> >>> http://mail.yahoo.com
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Regards
>> >>
>> >> _____________________
>> >> David Ginzburg
>> >> Developer, Digital Trowel
>> >> 1 Hayarden St., Airport City
>> >> [POB 169, NATBAG]
>> >> Lod, 70151, Israel
>> >> http://www.digitaltrowel.com/
>> >> Office: +972 73 240 522
>> >> Mobile: +972 50 496 0595
>> >>
>> >> CHECK OUT OUR NEW TEXT MINING BLOG:
>> >> http://mineyourbusiness.wordpress.com/
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> > Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>>
>>
>> --
>> Regards
>>
>> _____________________
>> David Ginzburg
>> Developer, Digital Trowel
>> 1 Hayarden St., Airport City
>> [POB 169, NATBAG]
>> Lod, 70151, Israel
>> http://www.digitaltrowel.com/
>> Office: +972 73 240 522
>> Mobile: +972 50 496 0595
>>
>> CHECK OUT OUR NEW TEXT MINING BLOG:
>> http://mineyourbusiness.wordpress.com/
>>
>
>
>
> --
> Regards
>
> _____________________
> David Ginzburg
> Developer, Digital Trowel
> 1 Hayarden St., Airport City
> [POB 169, NATBAG]
> Lod, 70151, Israel
> http://www.digitaltrowel.com/
> Office: +972 73 240 522
> Mobile: +972 50 496 0595
>
> CHECK OUT OUR NEW TEXT MINING BLOG:
> http://mineyourbusiness.wordpress.com/
>



-- 
Regards

_____________________
David Ginzburg
Developer, Digital Trowel
1 Hayarden St., Airport City
[POB 169, NATBAG]
Lod, 70151, Israel
http://www.digitaltrowel.com/
Office: +972 73 240 522
Mobile: +972 50 496 0595

CHECK OUT OUR NEW TEXT MINING BLOG:
http://mineyourbusiness.wordpress.com/

Re: synonym payload boosting

Posted by David Ginzburg <da...@digitaltrowel.com>.
Hi,
I have added a PayloadTermQueryPlugin after reading
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

my class is :
 */
*import org.apache.solr.common.params.SolrParams;*
*import org.apache.solr.common.util.NamedList;*
*import org.apache.solr.common.SolrException;*
*import org.apache.solr.request.SolrQueryRequest;*
*import org.apache.lucene.search.Query;*
*import org.apache.lucene.search.payloads.*;*
*import org.apache.lucene.queryParser.ParseException;*
*import org.apache.lucene.index.Term;*
*import org.apache.solr.search.QParser;*
*import org.apache.solr.search.QParserPlugin;*
*import org.apache.solr.search.QueryParsing;*
*
*
*public class PayloadTermQueryPlugin extends QParserPlugin {*
*    private MinPayloadFunction payloadFunc;*
*    @Override*
*  public void init(NamedList args) {*
*      this.payloadFunc=new MinPayloadFunction();*
*  }*
*
*
*  @Override*
*  public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {*
*    return new QParser(qstr, localParams, params, req) {*
*            @Override*
*      public Query parse() throws ParseException {*
*
*
*                 Term term = new Term(localParams.get(QueryParsing.F),
localParams.get(QueryParsing.V));*
*                  return new PayloadTermQuery(term,payloadFunc, false);*
*      }*
*    };*
*  }*


I tested it using Solrj

* @Override*
*    protected void setUp() throws Exception {*
*        super.setUp();*
*        System.setProperty("solr.solr.home", "C:\\temp\\solr_home1.4");*
*        CoreContainer.Initializer initializer = new
CoreContainer.Initializer();*
*
*
*        try {*
*            coreContainer = initializer.initialize();*
*        } catch (IOException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*        } catch (ParserConfigurationException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*        } catch (SAXException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*        }*
*        server = new EmbeddedSolrServer(coreContainer, "");*
*    }*
*    *
*    public void testSeacrhAndBoost() {*
*        SolrQuery query = new SolrQuery();*
*        query.setQuery("PFirstName:steve");*
*query.setParam("hl.fl", "PFirstName");*
* query.setParam("defType", "payload");*
*        query.setIncludeScore(true);*
*
*
*        query.setRows(10);*
*        query.setFacet(false);*
*
*
*        try {*
*            QueryResponse qr = server.query(query);*
*            *
*            List<PersonDoc> l = qr.getBeans(PersonDoc.class);*
*            for (PersonDoc personDoc : l) {*
*                System.out.println(personDoc);*
*            }*
*
*
*        } catch (SolrServerException ex) {*
*
 Logger.getLogger(BoostingSymilarityTest.class.getName()).log(Level.SEVERE,
null, ex);*
*
*
*        }*
*    }*
*}*


I get an NPE trying to access  localParams in the *public QParser
createParser(String qstr, SolrParams localParams, SolrParams params,
SolrQueryRequest req)*  method
The NPE is actually in the *public Query parse() throws ParseException*method

I could not find documentation about the parse method, How can I pass
the localParams?
What is the difference between the localParams and params?


I would be happy to write the a case study on the wiki but, I'm not sure
exactly what you mean- The resolution i will eventually come to or the
process of finding it?
I'm still trying to figure out what exactly to do.  I have purchased the
Solr 1.4 book , but it doesn't seem to have much information about my needs.

On Tue, Nov 10, 2009 at 10:09, David Ginzburg <da...@digitaltrowel.com>wrote:

> I would be happy to.
> I'm not sure exactly what you mean- The resolution i will eventually come
> to or the process of finding it?
> I'm still trying to figure out what exactly to do.  I have purchased the
> Solr 1.4 book , but it doesn't seem to have much information about my needs.
>
>
> ---------- Forwarded message ----------
> From: Lance Norskog <go...@gmail.com>
> Date: Tue, Nov 10, 2009 at 04:11
> Subject: Re: synonym payload boosting
> To: solr-user@lucene.apache.org
>
>
> David, when you get this working would you consider writing a case
> study on the wiki? Nothing complex, just something that describes how
> you did several customizations to create a new feature.
>
> On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >
> > On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
> >
> >> I have found this
> >>
> >>
> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> patch
> >> But i don't want to use any function, just the normal scoring and the
> >> similarity class  I have written.
> >> Can you point me to  modifications I need (if any) ?
> >>
> >>
> >
> > Amhet's point is that you need some query that will actually invoke the
> > payload in scoring.  PayloadTermQuery and PayloadNearQuery are the two
> that
> > do this in Lucene.  You can certainly write your own, as well.
> >
> > -Grant
> >
> >>
> >> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <io...@yahoo.com> wrote:
> >>
> >>> Additionaly you need to modify your queryparser to return
> >>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
> >>>
> >>> With these types of Queries scorePayload method invoked.
> >>>
> >>> Hope this helps.
> >>>
> >>> --- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:
> >>>
> >>>> From: David Ginzburg <da...@digitaltrowel.com>
> >>>> Subject: synonym payload boosting
> >>>> To: solr-user@lucene.apache.org
> >>>> Date: Sunday, November 8, 2009, 4:06 PM
> >>>> Hi,
> >>>> I have a field and a wighted synonym map.
> >>>> I have indexed the synonyms with the weight as payload.
> >>>> my code snippet from my filter
> >>>>
> >>>> *public Token next(final Token reusableToken) throws
> >>>> IOException *
> >>>> *        . *
> >>>> *        . *
> >>>> *        .*
> >>>>      * Payload boostPayload;*
> >>>> *
> >>>> *
> >>>> *        for (Synonym synonym : syns)
> >>>> {*
> >>>> *            *
> >>>> *            Token newTok =
> >>>> new Token(nToken.startOffset(),
> >>>> nToken.endOffset(), "SYNONYM");*
> >>>> *
> >>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> >>>> synonym.getToken().length());*
> >>>> *            // set the
> >>>> position increment to zero*
> >>>> *            // this tells
> >>>> lucene the synonym is*
> >>>> *            // in the exact
> >>>> same location as the originating word*
> >>>> *
> >>>> newTok.setPositionIncrement(0);*
> >>>> *            boostPayload =
> >>>> new
> >>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> >>>> *
> >>>> newTok.setPayload(boostPayload);*
> >>>> *
> >>>> *
> >>>> I have put it in the index time analyzer : this is my field
> >>>> definition:
> >>>>
> >>>> *
> >>>> <fieldType name="PersonName" class="solr.TextField"
> >>>> positionIncrementGap="100" >
> >>>>     <analyzer type="index">
> >>>>       <tokenizer
> >>>> class="solr.WhitespaceTokenizerFactory"/>
> >>>>       <filter
> >>>> class="solr.StopFilterFactory" ignoreCase="true"
> >>>> words="stopwords.txt"/>
> >>>>       <filter
> >>>> class="solr.LowerCaseFilterFactory"/>
> >>>>       <filter
> >>>> class="com.digitaltrowel.solr.DTSynonymFactory"
> >>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
> >>>> ignoreCase="true"
> >>>> expand="false"/>
> >>>>
> >>>>       <!--<filter
> >>>> class="solr.EnglishPorterFilterFactory"
> >>>> protected="protwords.txt"/>-->
> >>>>       <!--<filter
> >>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
> >>>>     </analyzer>
> >>>>     <analyzer type="query">
> >>>>       <tokenizer
> >>>> class="solr.WhitespaceTokenizerFactory"/>
> >>>>       <filter
> >>>> class="solr.LowerCaseFilterFactory"/>
> >>>>       <!--<filter
> >>>> class="com.digitaltrowel.solr.DTSynonymFactory"
> >>>> synonyms="synonyms.txt" ignoreCase="true"
> >>>> expand="false"/>-->
> >>>>       <filter
> >>>> class="solr.StopFilterFactory" ignoreCase="true"
> >>>> words="stopwords.txt"/>
> >>>>       <!--<filter
> >>>> class="solr.EnglishPorterFilterFactory"
> >>>> protected="protwords.txt"/>-->
> >>>>       <!--<filter
> >>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
> >>>>>
> >>>>> -->
> >>>>
> >>>>     </analyzer>
> >>>>   </fieldType>
> >>>>
> >>>>
> >>>> my similarity class is
> >>>> public class BoostingSymilarity extends DefaultSimilarity
> >>>> {
> >>>>
> >>>>
> >>>>   public BoostingSymilarity(){
> >>>>       super();
> >>>>
> >>>>  }
> >>>>   @Override
> >>>>   public  float scorePayload(String field,
> >>>> byte [] payload, int offset,
> >>>> int length)
> >>>> {
> >>>> double weight = PayloadHelper.decodeFloat(payload, 0);
> >>>> return (float)weight;
> >>>> }
> >>>>
> >>>> @Override public float coord(int overlap, int maxoverlap)
> >>>> {
> >>>> return 1.0f;
> >>>> }
> >>>>
> >>>> @Override public float idf(int docFreq, int numDocs)
> >>>> {
> >>>> return 1.0f;
> >>>> }
> >>>>
> >>>> @Override public float lengthNorm(String fieldName, int
> >>>> numTerms)
> >>>> {
> >>>> return 1.0f;
> >>>> }
> >>>>
> >>>> @Override public float tf(float freq)
> >>>> {
> >>>> return 1.0f;
> >>>> }
> >>>> }
> >>>>
> >>>> My problem is that scorePayload method does not get called
> >>>> at search time
> >>>> like the other methods in  my similarity class.
> >>>> I tested and verified it with break points.
> >>>> What am I doing wrong?
> >>>> I used solr 1.3 and thinking of the payload boos support in
> >>>> solr 1.4.
> >>>>
> >>>>
> >>>> *
> >>>>
> >>>
> >>> __________________________________________________
> >>> Do You Yahoo!?
> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
> >>> http://mail.yahoo.com
> >>>
> >>
> >>
> >>
> >> --
> >> Regards
> >>
> >> _____________________
> >> David Ginzburg
> >> Developer, Digital Trowel
> >> 1 Hayarden St., Airport City
> >> [POB 169, NATBAG]
> >> Lod, 70151, Israel
> >> http://www.digitaltrowel.com/
> >> Office: +972 73 240 522
> >> Mobile: +972 50 496 0595
> >>
> >> CHECK OUT OUR NEW TEXT MINING BLOG:
> >> http://mineyourbusiness.wordpress.com/
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>
>
> --
> Regards
>
> _____________________
> David Ginzburg
> Developer, Digital Trowel
> 1 Hayarden St., Airport City
> [POB 169, NATBAG]
> Lod, 70151, Israel
> http://www.digitaltrowel.com/
> Office: +972 73 240 522
> Mobile: +972 50 496 0595
>
> CHECK OUT OUR NEW TEXT MINING BLOG:
> http://mineyourbusiness.wordpress.com/
>



-- 
Regards

_____________________
David Ginzburg
Developer, Digital Trowel
1 Hayarden St., Airport City
[POB 169, NATBAG]
Lod, 70151, Israel
http://www.digitaltrowel.com/
Office: +972 73 240 522
Mobile: +972 50 496 0595

CHECK OUT OUR NEW TEXT MINING BLOG:
http://mineyourbusiness.wordpress.com/

Re: synonym payload boosting

Posted by Lance Norskog <go...@gmail.com>.
David, when you get this working would you consider writing a case
study on the wiki? Nothing complex, just something that describes how
you did several customizations to create a new feature.

On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
>
>> I have found this
>>
>> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> patch
>> But i don't want to use any function, just the normal scoring and the
>> similarity class  I have written.
>> Can you point me to  modifications I need (if any) ?
>>
>>
>
> Amhet's point is that you need some query that will actually invoke the
> payload in scoring.  PayloadTermQuery and PayloadNearQuery are the two that
> do this in Lucene.  You can certainly write your own, as well.
>
> -Grant
>
>>
>> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <io...@yahoo.com> wrote:
>>
>>> Additionaly you need to modify your queryparser to return
>>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>>>
>>> With these types of Queries scorePayload method invoked.
>>>
>>> Hope this helps.
>>>
>>> --- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:
>>>
>>>> From: David Ginzburg <da...@digitaltrowel.com>
>>>> Subject: synonym payload boosting
>>>> To: solr-user@lucene.apache.org
>>>> Date: Sunday, November 8, 2009, 4:06 PM
>>>> Hi,
>>>> I have a field and a wighted synonym map.
>>>> I have indexed the synonyms with the weight as payload.
>>>> my code snippet from my filter
>>>>
>>>> *public Token next(final Token reusableToken) throws
>>>> IOException *
>>>> *        . *
>>>> *        . *
>>>> *        .*
>>>>      * Payload boostPayload;*
>>>> *
>>>> *
>>>> *        for (Synonym synonym : syns)
>>>> {*
>>>> *            *
>>>> *            Token newTok =
>>>> new Token(nToken.startOffset(),
>>>> nToken.endOffset(), "SYNONYM");*
>>>> *
>>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
>>>> synonym.getToken().length());*
>>>> *            // set the
>>>> position increment to zero*
>>>> *            // this tells
>>>> lucene the synonym is*
>>>> *            // in the exact
>>>> same location as the originating word*
>>>> *
>>>> newTok.setPositionIncrement(0);*
>>>> *            boostPayload =
>>>> new
>>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
>>>> *
>>>> newTok.setPayload(boostPayload);*
>>>> *
>>>> *
>>>> I have put it in the index time analyzer : this is my field
>>>> definition:
>>>>
>>>> *
>>>> <fieldType name="PersonName" class="solr.TextField"
>>>> positionIncrementGap="100" >
>>>>     <analyzer type="index">
>>>>       <tokenizer
>>>> class="solr.WhitespaceTokenizerFactory"/>
>>>>       <filter
>>>> class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>       <filter
>>>> class="solr.LowerCaseFilterFactory"/>
>>>>       <filter
>>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
>>>> ignoreCase="true"
>>>> expand="false"/>
>>>>
>>>>       <!--<filter
>>>> class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>-->
>>>>       <!--<filter
>>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>>>>     </analyzer>
>>>>     <analyzer type="query">
>>>>       <tokenizer
>>>> class="solr.WhitespaceTokenizerFactory"/>
>>>>       <filter
>>>> class="solr.LowerCaseFilterFactory"/>
>>>>       <!--<filter
>>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>>> synonyms="synonyms.txt" ignoreCase="true"
>>>> expand="false"/>-->
>>>>       <filter
>>>> class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>       <!--<filter
>>>> class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>-->
>>>>       <!--<filter
>>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
>>>>>
>>>>> -->
>>>>
>>>>     </analyzer>
>>>>   </fieldType>
>>>>
>>>>
>>>> my similarity class is
>>>> public class BoostingSymilarity extends DefaultSimilarity
>>>> {
>>>>
>>>>
>>>>   public BoostingSymilarity(){
>>>>       super();
>>>>
>>>>  }
>>>>   @Override
>>>>   public  float scorePayload(String field,
>>>> byte [] payload, int offset,
>>>> int length)
>>>> {
>>>> double weight = PayloadHelper.decodeFloat(payload, 0);
>>>> return (float)weight;
>>>> }
>>>>
>>>> @Override public float coord(int overlap, int maxoverlap)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float idf(int docFreq, int numDocs)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float lengthNorm(String fieldName, int
>>>> numTerms)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float tf(float freq)
>>>> {
>>>> return 1.0f;
>>>> }
>>>> }
>>>>
>>>> My problem is that scorePayload method does not get called
>>>> at search time
>>>> like the other methods in  my similarity class.
>>>> I tested and verified it with break points.
>>>> What am I doing wrong?
>>>> I used solr 1.3 and thinking of the payload boos support in
>>>> solr 1.4.
>>>>
>>>>
>>>> *
>>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>
>>
>>
>> --
>> Regards
>>
>> _____________________
>> David Ginzburg
>> Developer, Digital Trowel
>> 1 Hayarden St., Airport City
>> [POB 169, NATBAG]
>> Lod, 70151, Israel
>> http://www.digitaltrowel.com/
>> Office: +972 73 240 522
>> Mobile: +972 50 496 0595
>>
>> CHECK OUT OUR NEW TEXT MINING BLOG:
>> http://mineyourbusiness.wordpress.com/
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: synonym payload boosting

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:

> I have found this
> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> patch
> But i don't want to use any function, just the normal scoring and the
> similarity class  I have written.
> Can you point me to  modifications I need (if any) ?
>
>

Amhet's point is that you need some query that will actually invoke  
the payload in scoring.  PayloadTermQuery and PayloadNearQuery are the  
two that do this in Lucene.  You can certainly write your own, as well.

-Grant

>
> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <io...@yahoo.com> wrote:
>
>> Additionaly you need to modify your queryparser to return
>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>>
>> With these types of Queries scorePayload method invoked.
>>
>> Hope this helps.
>>
>> --- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:
>>
>>> From: David Ginzburg <da...@digitaltrowel.com>
>>> Subject: synonym payload boosting
>>> To: solr-user@lucene.apache.org
>>> Date: Sunday, November 8, 2009, 4:06 PM
>>> Hi,
>>> I have a field and a wighted synonym map.
>>> I have indexed the synonyms with the weight as payload.
>>> my code snippet from my filter
>>>
>>> *public Token next(final Token reusableToken) throws
>>> IOException *
>>> *        . *
>>> *        . *
>>> *        .*
>>>       * Payload boostPayload;*
>>> *
>>> *
>>> *        for (Synonym synonym : syns)
>>> {*
>>> *            *
>>> *            Token newTok =
>>> new Token(nToken.startOffset(),
>>> nToken.endOffset(), "SYNONYM");*
>>> *
>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
>>> synonym.getToken().length());*
>>> *            // set the
>>> position increment to zero*
>>> *            // this tells
>>> lucene the synonym is*
>>> *            // in the exact
>>> same location as the originating word*
>>> *
>>> newTok.setPositionIncrement(0);*
>>> *            boostPayload =
>>> new
>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
>>> *
>>> newTok.setPayload(boostPayload);*
>>> *
>>> *
>>> I have put it in the index time analyzer : this is my field
>>> definition:
>>>
>>> *
>>> <fieldType name="PersonName" class="solr.TextField"
>>> positionIncrementGap="100" >
>>>      <analyzer type="index">
>>>        <tokenizer
>>> class="solr.WhitespaceTokenizerFactory"/>
>>>        <filter
>>> class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt"/>
>>>        <filter
>>> class="solr.LowerCaseFilterFactory"/>
>>>        <filter
>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
>>> ignoreCase="true"
>>> expand="false"/>
>>>
>>>        <!--<filter
>>> class="solr.EnglishPorterFilterFactory"
>>> protected="protwords.txt"/>-->
>>>        <!--<filter
>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>>>      </analyzer>
>>>      <analyzer type="query">
>>>        <tokenizer
>>> class="solr.WhitespaceTokenizerFactory"/>
>>>        <filter
>>> class="solr.LowerCaseFilterFactory"/>
>>>        <!--<filter
>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>> synonyms="synonyms.txt" ignoreCase="true"
>>> expand="false"/>-->
>>>        <filter
>>> class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt"/>
>>>        <!--<filter
>>> class="solr.EnglishPorterFilterFactory"
>>> protected="protwords.txt"/>-->
>>>        <!--<filter
>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
>>>> -->
>>>      </analyzer>
>>>    </fieldType>
>>>
>>>
>>> my similarity class is
>>> public class BoostingSymilarity extends DefaultSimilarity
>>> {
>>>
>>>
>>>    public BoostingSymilarity(){
>>>        super();
>>>
>>>  }
>>>    @Override
>>>    public  float scorePayload(String field,
>>> byte [] payload, int offset,
>>> int length)
>>> {
>>> double weight = PayloadHelper.decodeFloat(payload, 0);
>>> return (float)weight;
>>> }
>>>
>>> @Override public float coord(int overlap, int maxoverlap)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float idf(int docFreq, int numDocs)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float lengthNorm(String fieldName, int
>>> numTerms)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float tf(float freq)
>>> {
>>> return 1.0f;
>>> }
>>> }
>>>
>>> My problem is that scorePayload method does not get called
>>> at search time
>>> like the other methods in  my similarity class.
>>> I tested and verified it with break points.
>>> What am I doing wrong?
>>> I used solr 1.3 and thinking of the payload boos support in
>>> solr 1.4.
>>>
>>>
>>> *
>>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>
>
>
> -- 
> Regards
>
> _____________________
> David Ginzburg
> Developer, Digital Trowel
> 1 Hayarden St., Airport City
> [POB 169, NATBAG]
> Lod, 70151, Israel
> http://www.digitaltrowel.com/
> Office: +972 73 240 522
> Mobile: +972 50 496 0595
>
> CHECK OUT OUR NEW TEXT MINING BLOG:
> http://mineyourbusiness.wordpress.com/

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: synonym payload boosting

Posted by David Ginzburg <da...@digitaltrowel.com>.
I have found this
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 patch
But i don't want to use any function, just the normal scoring and the
similarity class  I have written.
Can you point me to  modifications I need (if any) ?



On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <io...@yahoo.com> wrote:

> Additionaly you need to modify your queryparser to return
> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>
> With these types of Queries scorePayload method invoked.
>
> Hope this helps.
>
> --- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:
>
> > From: David Ginzburg <da...@digitaltrowel.com>
> > Subject: synonym payload boosting
> > To: solr-user@lucene.apache.org
> > Date: Sunday, November 8, 2009, 4:06 PM
> > Hi,
> > I have a field and a wighted synonym map.
> > I have indexed the synonyms with the weight as payload.
> > my code snippet from my filter
> >
> > *public Token next(final Token reusableToken) throws
> > IOException *
> > *        . *
> > *        . *
> > *        .*
> >        * Payload boostPayload;*
> > *
> > *
> > *        for (Synonym synonym : syns)
> > {*
> > *            *
> > *            Token newTok =
> > new Token(nToken.startOffset(),
> > nToken.endOffset(), "SYNONYM");*
> > *
> > newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> > synonym.getToken().length());*
> > *            // set the
> > position increment to zero*
> > *            // this tells
> > lucene the synonym is*
> > *            // in the exact
> > same location as the originating word*
> > *
> > newTok.setPositionIncrement(0);*
> > *            boostPayload =
> > new
> > Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> > *
> > newTok.setPayload(boostPayload);*
> > *
> > *
> > I have put it in the index time analyzer : this is my field
> > definition:
> >
> > *
> > <fieldType name="PersonName" class="solr.TextField"
> > positionIncrementGap="100" >
> >       <analyzer type="index">
> >         <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >         <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >         <filter
> > class="solr.LowerCaseFilterFactory"/>
> >         <filter
> > class="com.digitaltrowel.solr.DTSynonymFactory"
> > FreskoFunction="names_with_scoresPipe23Columns.txt"
> > ignoreCase="true"
> > expand="false"/>
> >
> >         <!--<filter
> > class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>-->
> >         <!--<filter
> > class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer
> > class="solr.WhitespaceTokenizerFactory"/>
> >         <filter
> > class="solr.LowerCaseFilterFactory"/>
> >         <!--<filter
> > class="com.digitaltrowel.solr.DTSynonymFactory"
> > synonyms="synonyms.txt" ignoreCase="true"
> > expand="false"/>-->
> >         <filter
> > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >         <!--<filter
> > class="solr.EnglishPorterFilterFactory"
> > protected="protwords.txt"/>-->
> >         <!--<filter
> > class="solr.RemoveDuplicatesTokenFilterFactory"/
> >   >-->
> >       </analyzer>
> >     </fieldType>
> >
> >
> > my similarity class is
> > public class BoostingSymilarity extends DefaultSimilarity
> > {
> >
> >
> >     public BoostingSymilarity(){
> >         super();
> >
> >   }
> >     @Override
> >     public  float scorePayload(String field,
> > byte [] payload, int offset,
> > int length)
> > {
> >  double weight = PayloadHelper.decodeFloat(payload, 0);
> > return (float)weight;
> >  }
> >
> > @Override public float coord(int overlap, int maxoverlap)
> >  {
> > return 1.0f;
> > }
> >
> > @Override public float idf(int docFreq, int numDocs)
> > {
> >  return 1.0f;
> > }
> >
> > @Override public float lengthNorm(String fieldName, int
> > numTerms)
> >  {
> > return 1.0f;
> > }
> >
> > @Override public float tf(float freq)
> > {
> >  return 1.0f;
> > }
> > }
> >
> > My problem is that scorePayload method does not get called
> > at search time
> > like the other methods in  my similarity class.
> > I tested and verified it with break points.
> > What am I doing wrong?
> > I used solr 1.3 and thinking of the payload boos support in
> > solr 1.4.
> >
> >
> > *
> >
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



-- 
Regards

_____________________
David Ginzburg
Developer, Digital Trowel
1 Hayarden St., Airport City
[POB 169, NATBAG]
Lod, 70151, Israel
http://www.digitaltrowel.com/
Office: +972 73 240 522
Mobile: +972 50 496 0595

CHECK OUT OUR NEW TEXT MINING BLOG:
http://mineyourbusiness.wordpress.com/

Re: synonym payload boosting

Posted by AHMET ARSLAN <io...@yahoo.com>.
Additionaly you need to modify your queryparser to return BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.

With these types of Queries scorePayload method invoked.

Hope this helps.

--- On Sun, 11/8/09, David Ginzburg <da...@digitaltrowel.com> wrote:

> From: David Ginzburg <da...@digitaltrowel.com>
> Subject: synonym payload boosting
> To: solr-user@lucene.apache.org
> Date: Sunday, November 8, 2009, 4:06 PM
> Hi,
> I have a field and a wighted synonym map.
> I have indexed the synonyms with the weight as payload.
> my code snippet from my filter
> 
> *public Token next(final Token reusableToken) throws
> IOException *
> *        . *
> *        . *
> *        .*
>        * Payload boostPayload;*
> *
> *
> *        for (Synonym synonym : syns)
> {*
> *            *
> *            Token newTok =
> new Token(nToken.startOffset(),
> nToken.endOffset(), "SYNONYM");*
> *           
> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> synonym.getToken().length());*
> *            // set the
> position increment to zero*
> *            // this tells
> lucene the synonym is*
> *            // in the exact
> same location as the originating word*
> *           
> newTok.setPositionIncrement(0);*
> *            boostPayload =
> new
> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> *           
> newTok.setPayload(boostPayload);*
> *
> *
> I have put it in the index time analyzer : this is my field
> definition:
> 
> *
> <fieldType name="PersonName" class="solr.TextField"
> positionIncrementGap="100" >
>       <analyzer type="index">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> FreskoFunction="names_with_scoresPipe23Columns.txt"
> ignoreCase="true"
> expand="false"/>
> 
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <!--<filter
> class="com.digitaltrowel.solr.DTSynonymFactory"
> synonyms="synonyms.txt" ignoreCase="true"
> expand="false"/>-->
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <!--<filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>-->
>         <!--<filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/ 
>   >-->
>       </analyzer>
>     </fieldType>
> 
> 
> my similarity class is
> public class BoostingSymilarity extends DefaultSimilarity
> {
> 
> 
>     public BoostingSymilarity(){
>         super();
> 
>   }
>     @Override
>     public  float scorePayload(String field,
> byte [] payload, int offset,
> int length)
> {
>  double weight = PayloadHelper.decodeFloat(payload, 0);
> return (float)weight;
>  }
> 
> @Override public float coord(int overlap, int maxoverlap)
>  {
> return 1.0f;
> }
> 
> @Override public float idf(int docFreq, int numDocs)
> {
>  return 1.0f;
> }
> 
> @Override public float lengthNorm(String fieldName, int
> numTerms)
>  {
> return 1.0f;
> }
> 
> @Override public float tf(float freq)
> {
>  return 1.0f;
> }
> }
> 
> My problem is that scorePayload method does not get called
> at search time
> like the other methods in  my similarity class.
> I tested and verified it with break points.
> What am I doing wrong?
> I used solr 1.3 and thinking of the payload boos support in
> solr 1.4.
> 
> 
> *
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com