You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dominik Safaric <do...@gmail.com> on 2017/10/06 14:32:49 UTC

Custom Query & reading plongs used by a custom Scorer

I've implemented a custom Query whose responsibilities are as follows.
First, using an instance of a PointValues.IntersectVisitor classifying
documents as hit or not using a plong value. Secondly, calculating custom
scores using another document field, specified in the mapping as plongs.
The later is expected to calculate the custom score using an array of longs
comprised of 46 values.

The problem I am having is performance wise. Namely for calculating the
custom score I'm retrieving the values of the field using
LeafReader.document(docId()) which is a costly process. What alternatives
are there for reading plongs using a LeafReader and DocIdSetIterator within
a custom Scorer implementation?

Thanks in advance.
Dominik

Re: Custom Query & reading plongs used by a custom Scorer

Posted by Erick Erickson <er...@gmail.com>.
docValues are the first thing I'd look at. What you've done is an
anit-pattern for scoring because it reads the stored data from disk
and decompress it to read the value; as you say costly.

Getting it from a docValues field, OTOH, will read the value(s)
directly from MMapDirectory space, i.e. the OSs memory space. As an
aside, this is why Streaming only works with DV fields.

Two cautions though in terms of differences between DV and stored when
you have more than 1 term. The underlying structured is a sorted set,
therefore:
1> the contents are ordered by "natural" order rather than insertion order.
2> multiple identical values are collapsed into a single value.

So storing 1, 3, 99, 4, 4, 4, 4, 2, 3 will be returned a s 1, 2, 3, 4, 99

And another caution: you'll have to re-index completely when you add
docValues=true to your field definition, I'd start with a new
collection.

Best,
Erick

On Fri, Oct 6, 2017 at 7:32 AM, Dominik Safaric
<do...@gmail.com> wrote:
> I've implemented a custom Query whose responsibilities are as follows.
> First, using an instance of a PointValues.IntersectVisitor classifying
> documents as hit or not using a plong value. Secondly, calculating custom
> scores using another document field, specified in the mapping as plongs.
> The later is expected to calculate the custom score using an array of longs
> comprised of 46 values.
>
> The problem I am having is performance wise. Namely for calculating the
> custom score I'm retrieving the values of the field using
> LeafReader.document(docId()) which is a costly process. What alternatives
> are there for reading plongs using a LeafReader and DocIdSetIterator within
> a custom Scorer implementation?
>
> Thanks in advance.
> Dominik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org