You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2018/01/02 15:39:52 UTC

Re: Confusing DocValues documentation

On 12/22/2017 12:45 PM, Tech Id wrote:
> It seems that stored="false" docValues="true" is the default in Solr's
> github and the recommended way to go.

Like most things in Solr, there's no simple answer.  It depends.

For the purposes of information retrieval (not facets, grouping, or
sorting), whether you want stored or docValues will depend on a number
of factors.

Stored field data is compressed in the index.  This means that it takes
additional CPU processing to get the data from the index, but less data
must be read from disk.  DocValues is stored very differently.  With
docValues, the data is NOT compressed, and all of the values for one
field for all documents across the entire index segment are written in
one place, separately from any other field's docValues data.

If you are returning all fields for a document and there are more than a
few fields, then accessing stored data and decompressing it is probably
going to be faster than accessing docValues data.  For one thing, all
the stored data for a single document is compressed and written
together.  With docValues, each field is in a different place, so
multiple parts of the disk will need to be accessed to get results for
multiple fields of a single document.

If the index is small enough that it can easily be cached by the OS,
then docValues will probably be faster, because accessing the data will
be lightning fast and no decompression step is necessary.  But if the
index is too big to be fully cached, then only experimentation will
allow you to know which is better.

For facets, grouping, and/or sorting, using docValues instead of indexed
data (indexed="true") will generally offer better performance, and WILL
use less heap memory.  Frequently, deciding which way performs better
requires experimentation.  Using indexed data and a larger heap could
perform better in some situations.

For information retrieval, stored is *usually* better than docValues,
but not always.

Thanks,
Shawn