You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2015/06/24 19:17:11 UTC
replacing stored fields with docValues to leverage updatable doc
values -- was: Re: Version field as DV
: I don’t know if it’s worth it in terms of the trade-offs, but there’s
: something to be said about having *both* indexed=true & docValues=true on
Yes that's true -- and: again ... not at all what Ishan is asking
about.
(The tradeoffs between DV and indexed are known and the question of how
they might apply to _version_ field best practices is something already
being idsucssed/tracked in the issue i mentioned before.)
The question Ishan was trying to ask, and what this thread keeps diverging
from (so i just changed the subject to try and make it more clear) is
about eliminating *stored* values from use with a particular field using
docValues in it's place.
Wether or not *indexed* values should/could also be used for performace
of searches is a completley orthoginal question -- the current discussion
is about the possibility of using the "updatable" feature of DocValues to
change some field values (in solr's case, one of those fields would *have*
to be the version field, hence the original poor subject of this thread)
and then relying *only* on the docValues to "return" the current field
values to the client.
So for a concrete example...
id: indexed + stored + DV
title: indexed/tokenxed + stored
_version_: DV
price: DV
...so if i want to change the "title" of a book, i have to completley
re-index it, but if i only want to change the *price* of a book, I use
updatable doc values to change the price field (and in solr's case, for
correct optimistic concurrency, i also update the _version_ field).
But if/when users do paginated searches of books, and get ~100 results, we
use stored fields to get the id & title of each result, but we use DV to
return the current "price" (and version)
Make sense?
Which brings us back to the question: are there any serious performance
downsides to "abusuing" doc values in this way instead of using stored
fields? My recollection is that back in the early days of doc values
someone did some fairly serious performanc testing and decided that trying
to use docvalues for this purpose was in fact a lot slower then stored
fields because of the random disk seeks (as opposed to all storedfields
for a single doc being co-located)
: > The key bit of context of Ishan's question is updateable docValues
: > (SOLR-5944) and if/how it might be usable in Solr for the version field --
: > but one key aspect of doing that would be in ensuring that we can *return*
: > the correct version value to user (for optimistic concurrency). Currently
: > that's done with stored fields, but that wouldn't be feasible if we go
: > down hte route of updateable docValues, which means we would have to
: > "return" the version field from the docValues.
: >
: > that's where ishan's question about docvalues and performance and disk
: > seeks comes from...
: >
: > What are the downsides in saying "instead of using docvalues and stored
: > fields for this this single valued int per doc, we're only going to use
: > docvalues & when doing pagination we will return the current value of the
: > field to the user from the docvalues" what kind of performance impacts
: > come up in that case when you have 100 docs per page(ination)
-Hoss
http://www.lucidworks.com/