You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Naama Kraus <na...@gmail.com> on 2010/06/23 12:38:35 UTC

Overriding Lucene's term weights computation

Hi,

Is there a way for an application to index a document along with its "term
weighted vector" (Lucene's TermFreqVector). I.e., override the term
frequencies computed by Lucene, with an application's computed term weights
(non frequency based) ?
I don't think I want to use Scorer#score() for applying score changes as
this one is activated at search time which won't work for me.

Thanks for any insight,
Naama

Re: Overriding Lucene's term weights computation

Posted by Naama Kraus <na...@gmail.com>.
Great ! Thanks Dionisis.
Naama

On Thu, Jun 24, 2010 at 10:27 AM, Dionisis Koumouras <ku...@gmail.com>wrote:

> Naama,
> I recently faced a similar problem. Overriding the way Lucene uses
> TermVectors seemed quite complex for me. I used the payloads mechanism
> instead so that I could store a float payload with each word. Then, I
> overloaded the similarity class to change the way results are scored, based
> on the payloads.
> This is the example I was hinted with through this list:
>
> http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
> I hope it helps you too
>
> Dionisis
>
> On Thu, Jun 24, 2010 at 9:19 AM, Naama Kraus <na...@gmail.com> wrote:
>
> > ok, thanks Yuval. I'll take a look.
> > Could you (or anyone) please elaborate why payloads "seem like a worse
> fit"
> > ?
> >
> > TX, Naama
> >
> > On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein <yuvalf@answers.com
> > >wrote:
> >
> > > Naama, Maybe you could use the new flexible indexing mechanism.
> > > Some information is in this lecture:
> > >
> > >
> >
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> > > Alternatively, you may use payloads, but they seem like a worse fit.
> > > Good Luck,
> > > Yuval
> > >
> > > ________________________________________
> > > From: Naama Kraus [naamakraus@gmail.com]
> > > Sent: Wednesday, June 23, 2010 1:38 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Overriding Lucene's term weights computation
> > >
> > > Hi,
> > >
> > > Is there a way for an application to index a document along with its
> > "term
> > > weighted vector" (Lucene's TermFreqVector). I.e., override the term
> > > frequencies computed by Lucene, with an application's computed term
> > weights
> > > (non frequency based) ?
> > > I don't think I want to use Scorer#score() for applying score changes
> as
> > > this one is activated at search time which won't work for me.
> > >
> > > Thanks for any insight,
> > > Naama
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: Overriding Lucene's term weights computation

Posted by Dionisis Koumouras <ku...@gmail.com>.
Naama,
I recently faced a similar problem. Overriding the way Lucene uses
TermVectors seemed quite complex for me. I used the payloads mechanism
instead so that I could store a float payload with each word. Then, I
overloaded the similarity class to change the way results are scored, based
on the payloads.
This is the example I was hinted with through this list:
http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
I hope it helps you too

Dionisis

On Thu, Jun 24, 2010 at 9:19 AM, Naama Kraus <na...@gmail.com> wrote:

> ok, thanks Yuval. I'll take a look.
> Could you (or anyone) please elaborate why payloads "seem like a worse fit"
> ?
>
> TX, Naama
>
> On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein <yuvalf@answers.com
> >wrote:
>
> > Naama, Maybe you could use the new flexible indexing mechanism.
> > Some information is in this lecture:
> >
> >
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> > Alternatively, you may use payloads, but they seem like a worse fit.
> > Good Luck,
> > Yuval
> >
> > ________________________________________
> > From: Naama Kraus [naamakraus@gmail.com]
> > Sent: Wednesday, June 23, 2010 1:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Overriding Lucene's term weights computation
> >
> > Hi,
> >
> > Is there a way for an application to index a document along with its
> "term
> > weighted vector" (Lucene's TermFreqVector). I.e., override the term
> > frequencies computed by Lucene, with an application's computed term
> weights
> > (non frequency based) ?
> > I don't think I want to use Scorer#score() for applying score changes as
> > this one is activated at search time which won't work for me.
> >
> > Thanks for any insight,
> > Naama
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Overriding Lucene's term weights computation

Posted by Naama Kraus <na...@gmail.com>.
OK, got it. Thanks Yuval.

Naama

On Thu, Jun 24, 2010 at 10:44 AM, Yuval Feinstein <yu...@answers.com>wrote:

> Naama,
> AFAIK, payloads store an arbitrary byte array per position
> (see
>
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
> and
>
> http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/
> )
> You define what you put in the payload during indexing, and how to use it
> during retrieval.
> It seems like you want to replace the default term vectors.
> Payloads are an additional mechanism, on top of the term vectors.
> This means that implementing payloads will have a memory and a run time
> cost.
> If you have no use for the original term vectors it makes more sense to
> replace them using flex indexing,
> Because the data structures and algorithms for handling term vectors are
> more near the core of Lucene.
> Hope this makes sense,
> Yuval
>
>
>
> -----Original Message-----
> From: Naama Kraus [mailto:naamakraus@gmail.com]
> Sent: Thursday, June 24, 2010 9:19 AM
> To: java-user@lucene.apache.org
> Subject: Re: Overriding Lucene's term weights computation
>
> ok, thanks Yuval. I'll take a look.
> Could you (or anyone) please elaborate why payloads "seem like a worse fit"
> ?
>
> TX, Naama
>
> On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein <yuvalf@answers.com
> >wrote:
>
> > Naama, Maybe you could use the new flexible indexing mechanism.
> > Some information is in this lecture:
> >
> >
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> > Alternatively, you may use payloads, but they seem like a worse fit.
> > Good Luck,
> > Yuval
> >
> > ________________________________________
> > From: Naama Kraus [naamakraus@gmail.com]
> > Sent: Wednesday, June 23, 2010 1:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Overriding Lucene's term weights computation
> >
> > Hi,
> >
> > Is there a way for an application to index a document along with its
> "term
> > weighted vector" (Lucene's TermFreqVector). I.e., override the term
> > frequencies computed by Lucene, with an application's computed term
> weights
> > (non frequency based) ?
> > I don't think I want to use Scorer#score() for applying score changes as
> > this one is activated at search time which won't work for me.
> >
> > Thanks for any insight,
> > Naama
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Overriding Lucene's term weights computation

Posted by Yuval Feinstein <yu...@answers.com>.
Naama,
AFAIK, payloads store an arbitrary byte array per position
(see 
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
and 
http://www.lucidimagination.com/blog/2010/04/18/refresh-getting-started-with-payloads/)
You define what you put in the payload during indexing, and how to use it during retrieval.
It seems like you want to replace the default term vectors.
Payloads are an additional mechanism, on top of the term vectors.
This means that implementing payloads will have a memory and a run time cost.
If you have no use for the original term vectors it makes more sense to replace them using flex indexing,
Because the data structures and algorithms for handling term vectors are more near the core of Lucene.
Hope this makes sense,
Yuval



-----Original Message-----
From: Naama Kraus [mailto:naamakraus@gmail.com] 
Sent: Thursday, June 24, 2010 9:19 AM
To: java-user@lucene.apache.org
Subject: Re: Overriding Lucene's term weights computation

ok, thanks Yuval. I'll take a look.
Could you (or anyone) please elaborate why payloads "seem like a worse fit"
?

TX, Naama

On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein <yu...@answers.com>wrote:

> Naama, Maybe you could use the new flexible indexing mechanism.
> Some information is in this lecture:
>
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> Alternatively, you may use payloads, but they seem like a worse fit.
> Good Luck,
> Yuval
>
> ________________________________________
> From: Naama Kraus [naamakraus@gmail.com]
> Sent: Wednesday, June 23, 2010 1:38 PM
> To: java-user@lucene.apache.org
> Subject: Overriding Lucene's term weights computation
>
> Hi,
>
> Is there a way for an application to index a document along with its "term
> weighted vector" (Lucene's TermFreqVector). I.e., override the term
> frequencies computed by Lucene, with an application's computed term weights
> (non frequency based) ?
> I don't think I want to use Scorer#score() for applying score changes as
> this one is activated at search time which won't work for me.
>
> Thanks for any insight,
> Naama
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Overriding Lucene's term weights computation

Posted by Naama Kraus <na...@gmail.com>.
ok, thanks Yuval. I'll take a look.
Could you (or anyone) please elaborate why payloads "seem like a worse fit"
?

TX, Naama

On Wed, Jun 23, 2010 at 11:00 PM, Yuval Feinstein <yu...@answers.com>wrote:

> Naama, Maybe you could use the new flexible indexing mechanism.
> Some information is in this lecture:
>
> http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
> Alternatively, you may use payloads, but they seem like a worse fit.
> Good Luck,
> Yuval
>
> ________________________________________
> From: Naama Kraus [naamakraus@gmail.com]
> Sent: Wednesday, June 23, 2010 1:38 PM
> To: java-user@lucene.apache.org
> Subject: Overriding Lucene's term weights computation
>
> Hi,
>
> Is there a way for an application to index a document along with its "term
> weighted vector" (Lucene's TermFreqVector). I.e., override the term
> frequencies computed by Lucene, with an application's computed term weights
> (non frequency based) ?
> I don't think I want to use Scorer#score() for applying score changes as
> this one is activated at search time which won't work for me.
>
> Thanks for any insight,
> Naama
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Overriding Lucene's term weights computation

Posted by Yuval Feinstein <yu...@answers.com>.
Naama, Maybe you could use the new flexible indexing mechanism.
Some information is in this lecture:
http://lucene-eurocon.org/slides/Lucene-Forecast-Version-Unicode-Flex-and-Mod_Willnauer&Schindler.pdf
Alternatively, you may use payloads, but they seem like a worse fit.
Good Luck,
Yuval

________________________________________
From: Naama Kraus [naamakraus@gmail.com]
Sent: Wednesday, June 23, 2010 1:38 PM
To: java-user@lucene.apache.org
Subject: Overriding Lucene's term weights computation

Hi,

Is there a way for an application to index a document along with its "term
weighted vector" (Lucene's TermFreqVector). I.e., override the term
frequencies computed by Lucene, with an application's computed term weights
(non frequency based) ?
I don't think I want to use Scorer#score() for applying score changes as
this one is activated at search time which won't work for me.

Thanks for any insight,
Naama
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org