You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Aashish Dattani <aa...@zettata.com> on 2016/06/22 09:37:35 UTC

In-memory docValues speedup

Hi everyone,

I have 2 million documents in my Solr index. I have enabled docValues on one
of the integer fields, and set its docValuesFormat to Memory. This is
because I want to have very quick forward lookups on this field in my custom
component. 

I am running my Solr installation on a 35GB RAM machine, so memory is not a
big resource constraint.

I was expecting that making the docValuesFormat to Memory will make the
docValues access to be as fast as a simple array lookup - but it didn't.
Upon digging deeper and inspecting the MemoryDocValuesProducer class, it
seems like Lucene is doing a delta compression and storing it in a
PackedInts datastructure, which is making the lookups slower. 

Is there any configuration/implementation that I can use to make sure that
the docValues are faster than they are right now - essentially any way to
ensure that it is stored as an in-memory array? Do you see any downsides to
using this approach?

I'd appreciate any help on this.

Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/In-memory-docValues-speedup-tp4283783.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: In-memory docValues speedup

Posted by Aashish Dattani <aa...@zettata.com>.

Could it be because of how the DirectDocValuesProducer populates the entire
bytes array for each request? i.e. for each call of getNumericDocValues(),
it reads all the bytes into an array first. The get() method itself seems to
be a simple array lookup. 

Link to the loadNumeric() of DirectDocValuesProducer:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-codecs/4.9.1/org/apache/lucene/codecs/memory/DirectDocValuesProducer.java#DirectDocValuesProducer.loadNumeric%28org.apache.lucene.codecs.memory.DirectDocValuesProducer.NumericEntry%29





--
View this message in context: http://lucene.472066.n3.nabble.com/In-memory-docValues-speedup-tp4283783p4283827.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: In-memory docValues speedup

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Joel Bernstein <jo...@gmail.com> wrote:
> I've tested with the Direct docValuesFormat which is uncompressed
> in-memory. But I haven't seen any noticeable performance gain. I've been
> meaning to dig into exactly why I wasn't seeing a performance gain, but
> haven't had the chance to do this yet.

If this is about Direct PackedInts (1 entry in an underlying array = 1 value) vs. Packed PackedInts (bits must be shifted and masked out from the underlying array), then there is little time difference when retrieving values. I think this is caused by memory access being slow and computation being fast.

It was discussed in Solr-4096. There is still some old and very poorly labeled charts at http://ekot.dk/misc/packedints/

Thinking out loud, I seem to remember some talk about the current PackedInts-using docValues code being hard for the JVM to inline? Could the problem be the number of indirections for a lookup, rather than the underlying data structure?

- Toke Eskildsen

Re: In-memory docValues speedup

Posted by Joel Bernstein <jo...@gmail.com>.

I've tested with the Direct docValuesFormat which is uncompressed
in-memory. But I haven't seen any noticeable performance gain. I've been
meaning to dig into exactly why I wasn't seeing a performance gain, but
haven't had the chance to do this yet.

If you test out the Direct docValuesFormant, I'd be interested in hearing
your findings.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 22, 2016 at 5:37 AM, Aashish Dattani <aa...@zettata.com>
wrote:

> Hi everyone,
>
> I have 2 million documents in my Solr index. I have enabled docValues on
> one
> of the integer fields, and set its docValuesFormat to Memory. This is
> because I want to have very quick forward lookups on this field in my
> custom
> component.
>
> I am running my Solr installation on a 35GB RAM machine, so memory is not a
> big resource constraint.
>
> I was expecting that making the docValuesFormat to Memory will make the
> docValues access to be as fast as a simple array lookup - but it didn't.
> Upon digging deeper and inspecting the MemoryDocValuesProducer class, it
> seems like Lucene is doing a delta compression and storing it in a
> PackedInts datastructure, which is making the lookups slower.
>
> Is there any configuration/implementation that I can use to make sure that
> the docValues are faster than they are right now - essentially any way to
> ensure that it is stored as an in-memory array? Do you see any downsides to
> using this approach?
>
> I'd appreciate any help on this.
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/In-memory-docValues-speedup-tp4283783.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>