You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dawid Weiss <da...@cs.put.poznan.pl> on 2005/09/28 16:24:45 UTC

A very technical question.

Hi.

I have a very technical question. I need to alter document score (or in 
fact: document boosts) for an existing index, but for each query. In 
other words, I'd like these to have pseudo-queries of the form:

1. civil war PREFER:shorter
2. civil war PREFER:longer

for these two queries, 1. would score shorter documents higher then 
option 2, which would in turn score longer documents higher. Note that 
these preferences can be expressed at query time, so static document 
boosts are of little help.

I'd appreciate if those familiar with the internals of Lucene gave me 
brief instructions on how this could be achieved (my rough guess is that 
I'll need to build my own Scorer... but how to access document length 
and where to plug in that scorer... besides I'd rather hear it from 
somebody with more expertise).

Thanks,
D.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A very technical question.

Posted by Doug Cutting <cu...@apache.org>.
Dawid Weiss wrote:
> I have a very technical question. I need to alter document score (or in 
> fact: document boosts) for an existing index, but for each query. In 
> other words, I'd like these to have pseudo-queries of the form:
> 
> 1. civil war PREFER:shorter
> 2. civil war PREFER:longer
> 
> for these two queries, 1. would score shorter documents higher then 
> option 2, which would in turn score longer documents higher. Note that 
> these preferences can be expressed at query time, so static document 
> boosts are of little help.

You could subclass FilterIndexReader and override the norms() method to 
cook the values in the returned array, caching the cooked copy.  Then 
use different IndexReaders for different queries, each cooking the norms 
differently.  Cooking could be fast: compute a 256 byte table and use it 
to map each byte to its cooked value.  Does this make sense?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A very technical question.

Posted by Yonik Seeley <ys...@gmail.com>.
Field length isn't stored... It gets folded into the norm (see
Similarity.lengthNorm) along with the boost and indexing time.

A couple of approaches:
a) index the field twice with two different Similarity implementations
b) store term vectors, derive the length from them and store in the
FieldCache, implement your own Query/Scorer to factor that in.
c) store a separate length field and use my soon-to-be-finished
FunctionQuery
http://www.mail-archive.com/java-dev@lucene.apache.org/msg02173.html
(blech, the indentation is messed up in that archive)
d) use FunctionQuery with a custom source that derives field length from
term vectors


(a) Is your best bet right now for a quick solution IMO.


-Yonik
Now hiring -- http://tinyurl.com/7m67g



On 9/28/05, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
>
>
> Hi.
>
> I have a very technical question. I need to alter document score (or in
> fact: document boosts) for an existing index, but for each query. In
> other words, I'd like these to have pseudo-queries of the form:
>
> 1. civil war PREFER:shorter
> 2. civil war PREFER:longer
>
> for these two queries, 1. would score shorter documents higher then
> option 2, which would in turn score longer documents higher. Note that
> these preferences can be expressed at query time, so static document
> boosts are of little help.
>
> I'd appreciate if those familiar with the internals of Lucene gave me
> brief instructions on how this could be achieved (my rough guess is that
> I'll need to build my own Scorer... but how to access document length
> and where to plug in that scorer... besides I'd rather hear it from
> somebody with more expertise).
>
> Thanks,
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: A very technical question.

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
Thanks for all the responses, guys. I'll analyze them and post my 
results if any. Doug's suggestion was closest to what I tentatively felt 
it could look like. I'll see if I can make it work.

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: A very technical question.

Posted by Andy Liu <an...@gmail.com>.
While you're indexing, you can assign each doc with a field that refers to
how long the document is. So, for example, you can add a field named
"docLength" for each document, and assign it with discrete values such as
"veryshort", "short", "medium", "long", "verylong", depending on how
granular you need it. Then at query time you can specify the field and a
given boost value, i.e.

civil war docLength:verylong^5 docLength:long^3

Andy

On 9/28/05, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
>
>
> Hi.
>
> I have a very technical question. I need to alter document score (or in
> fact: document boosts) for an existing index, but for each query. In
> other words, I'd like these to have pseudo-queries of the form:
>
> 1. civil war PREFER:shorter
> 2. civil war PREFER:longer
>
> for these two queries, 1. would score shorter documents higher then
> option 2, which would in turn score longer documents higher. Note that
> these preferences can be expressed at query time, so static document
> boosts are of little help.
>
> I'd appreciate if those familiar with the internals of Lucene gave me
> brief instructions on how this could be achieved (my rough guess is that
> I'll need to build my own Scorer... but how to access document length
> and where to plug in that scorer... besides I'd rather hear it from
> somebody with more expertise).
>
> Thanks,
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Andy Liu
andyliu1227@gmail.com
(301) 873-8458