You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2015/06/24 19:17:11 UTC

replacing stored fields with docValues to leverage updatable doc values -- was: Re: Version field as DV

: I don’t know if it’s worth it in terms of the trade-offs, but there’s
: something to be said about having *both* indexed=true & docValues=true on

Yes that's true -- and: again ... not at all what Ishan is asking 
about.

(The tradeoffs between DV and indexed are known and the question of how 
they might apply to _version_ field best practices is something already 
being idsucssed/tracked in the issue i mentioned before.)

The question Ishan was trying to ask, and what this thread keeps diverging 
from (so i just changed the subject to try and make it more clear) is 
about eliminating *stored* values from use with a particular field using 
docValues in it's place.

Wether or not *indexed* values should/could also be used for performace 
of searches is a completley orthoginal question -- the current discussion 
is about the possibility of using the "updatable" feature of DocValues to 
change some field values (in solr's case, one of those fields would *have* 
to be the version field, hence the original poor subject of this thread) 
and then relying *only* on the docValues to "return" the current field 
values to the client.

So for a concrete example...

   id: indexed + stored + DV
   title: indexed/tokenxed + stored
   _version_: DV
   price: DV

...so if i want to change the "title" of a book, i have to completley 
re-index it, but if i only want to change the *price* of a book, I use 
updatable doc values to change the price field (and in solr's case, for 
correct optimistic concurrency, i also update the _version_ field).

But if/when users do paginated searches of books, and get ~100 results, we 
use stored fields to get the id & title of each result, but we use DV to 
return the current "price" (and version)

Make sense?

Which brings us back to the question:  are there any serious performance 
downsides to "abusuing" doc values in this way instead of using stored 
fields?  My recollection is that back in the early days of doc values 
someone did some fairly serious performanc testing and decided that trying 
to use docvalues for this purpose was in fact a lot slower then stored 
fields because of the random disk seeks (as opposed to all storedfields 
for a single doc being co-located)

: > The key bit of context of Ishan's question is updateable docValues
: > (SOLR-5944) and if/how it might be usable in Solr for the version field --
: > but one key aspect of doing that would be in ensuring that we can *return*
: > the correct version value to user (for optimistic concurrency).  Currently
: > that's done with stored fields, but that wouldn't be feasible if we go
: > down hte route of updateable docValues, which means we would have to
: > "return" the version field from the docValues.
: >
: > that's where ishan's question about docvalues and performance and disk
: > seeks comes from...
: >
: > What are the downsides in saying "instead of using docvalues and stored
: > fields for this this single valued int per doc, we're only going to use
: > docvalues & when doing pagination we will return the current value of the
: > field to the user from the docvalues" what kind of performance impacts
: > come up in that case when you have 100 docs per page(ination)


-Hoss
http://www.lucidworks.com/