You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by S G <sg...@gmail.com> on 2017/12/21 01:09:03 UTC

DocValues for multivalued strings and boolean fields

Hi,

One of our Solr users is trying to set docValues="true" for multivalued
string fields and boolean-type fields.

I am not sure what the performance impact of that would be.
Can docValues negatively affect performance in any way?

We are using Solr 6.5.1 and also experimenting with 7.1.0

Thanks
SG

Re: DocValues for multivalued strings and boolean fields

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/20/2017 6:09 PM, S G wrote:
> One of our Solr users is trying to set docValues="true" for multivalued
> string fields and boolean-type fields.
> 
> I am not sure what the performance impact of that would be.
> Can docValues negatively affect performance in any way?

Adding to what Emir said:

The docValues data will be the same as stored data, but it will be 
uncompressed, and written in such a way that Lucene can read all values 
for one field simply by reading data off the disk, no computations or 
seeks within the file are required.

If the field is indexed and stored, then docValues will not be accessed 
during normal queries unless there is a sort parameter or a facet 
parameter that mentions a field with docValues.  If present, docValues 
data will be used for sorting and facets, otherwise indexed values will 
be used.  Usually, sorting or facets with docValues uses less memory and 
performs faster than the same operation without docValues.  If the 
machine has insufficient system RAM to effectively cache index data, the 
performance may not improve.

When docValues is added to a field, a complete reindex is required, or 
Solr will not work properly.

If a field that already contains docValues has a change in the setting 
for multiValued, then that will require a reindex, but you must also 
take another step -- completely wiping the index directory before 
reloading or restarting.  If the wipe doesn't happen in this situation, 
then the core is going to completely break and throw exceptions.

Thanks,
Shawn

Re: DocValues for multivalued strings and boolean fields

Posted by Emir Arnautović <em...@sematext.com>.
Hi SG,
Doc values is another file to write so indexing performances will suffer. In theory, query performances will suffer because alternative is in memory structure (fieldCache and fieldValueCache). In practice, it will not because in memory structure requires larger heap, requires time/resources to build  after each commit or on first query and it is likely that doc values’ files will be cached by OS so it will not be “disk speed”.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Dec 2017, at 02:09, S G <sg...@gmail.com> wrote:
> 
> Hi,
> 
> One of our Solr users is trying to set docValues="true" for multivalued
> string fields and boolean-type fields.
> 
> I am not sure what the performance impact of that would be.
> Can docValues negatively affect performance in any way?
> 
> We are using Solr 6.5.1 and also experimenting with 7.1.0
> 
> Thanks
> SG