You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2013/03/29 16:31:17 UTC

DocValues vs stored fields?

I’m still a little fuzzy on DocValues (maybe because I’m still grappling with how it does or doesn’t still relate to “Column Stride Fields”), so can anybody clue me in as to how useful DocValues is/are?

Are DocValues simply an alternative to “stored fields”?

If so, and if DocValues are so great, why aren’t we just switching Solr over to DocValues under the hood for all fields?

And if there are “issues” with DocValues that would make such a complete switchover less than absolutely desired, what are those issues?

In short, when should a user use DocValues over stored fields, and vice versa?

As things stand, all we’ve done is make Solr more confusing than it was before, without improving its OOBE. OOBE should be job one in Solr.

Thanks.

P.S., And if I actually want to do Column Stride Fields, is there a way to do that?

-- Jack Krupansky

Re: DocValues vs stored fields?

Posted by Marcin Rzewucki <mr...@gmail.com>.
Hi Otis,

Currently, whole record has to be stored on disk in order to update single
field. Are you trying to say that it won't be necessary with the use of
DocValues ? Sounds great!

Regards.


On 29 March 2013 20:51, Otis Gospodnetic <ot...@gmail.com> wrote:

> Hi,
>
> The current field update mechanism is not really a field update
> mechanism.  It just looks like that from the outside.  DocValues
> should make true field updates implementable.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki <mr...@gmail.com>
> wrote:
> > Hi,
> > Atomic updates (single field updates) do not depend on DocValues. They
> were
> > implemented in Solr4.0 and works fine (but all fields have to be
> > retrievable). DocValues are supposed to be more efficient than
> FieldCache.
> > Why not enabled by default ? Maybe because they are not for all fields
> and
> > because of their limitations (a field has to be single-valued, required
> or
> > to have default value).
> > Regards.
> >
> >
> >
> > On 29 March 2013 17:20, Timothy Potter <th...@gmail.com> wrote:
> >
> >> Hi Jack,
> >>
> >> I've just started to dig into this as well, so sharing what I know but
> >> still some holes in my knowledge too.
> >>
> >> DocValues == Column Stride Fields (best resource I know of so far is
> >> Simon's preso from Lucene Rev 2011 -
> >>
> >>
> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
> >> ).
> >> It's pretty dense but some nuggets I've gleaned from this are:
> >>
> >> 1) DocValues are more efficient in terms of memory usage and I/O
> >> performance for building an alternative to FieldCache (slide 27 is very
> >> impressive)
> >> 2) DocValues has a more efficient way to store primitive types, such as
> >> packed ints
> >> 3) Faster random access to stored values
> >>
> >> In terms of switch-over, you have to re-index to change your fields to
> use
> >> DocValues on disk, which is why they are not enabled by default.
> >>
> >> Lastly, another goal of DocValues is to allow updates to a single field
> w/o
> >> re-indexing the entire doc. That's not implemented yet but I think still
> >> planned.
> >>
> >> Cheers,
> >>  Tim
> >>
> >>
> >>
> >> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky <
> jack@basetechnology.com
> >> >wrote:
> >>
> >> > I’m still a little fuzzy on DocValues (maybe because I’m still
> grappling
> >> > with how it does or doesn’t still relate to “Column Stride Fields”),
> so
> >> can
> >> > anybody clue me in as to how useful DocValues is/are?
> >> >
> >> > Are DocValues simply an alternative to “stored fields”?
> >> >
> >> > If so, and if DocValues are so great, why aren’t we just switching
> Solr
> >> > over to DocValues under the hood for all fields?
> >> >
> >> > And if there are “issues” with DocValues that would make such a
> complete
> >> > switchover less than absolutely desired, what are those issues?
> >> >
> >> > In short, when should a user use DocValues over stored fields, and
> vice
> >> > versa?
> >> >
> >> > As things stand, all we’ve done is make Solr more confusing than it
> was
> >> > before, without improving its OOBE. OOBE should be job one in Solr.
> >> >
> >> > Thanks.
> >> >
> >> > P.S., And if I actually want to do Column Stride Fields, is there a
> way
> >> to
> >> > do that?
> >> >
> >> > -- Jack Krupansky
> >>
>

Re: DocValues vs stored fields?

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Otis,

DocValues are quite insufficient for true field updates.  DocValues is a
per-document value storage (hence the name); it's not uninverted/indexed. 
If you needed to search based on these values (e.g. find all docs that have
this value or between these values) then that's not going to work.  The most
promising field update work going on right now is
https://issues.apache.org/jira/browse/LUCENE-4258 "Incremental Field Updates
through Stacked Segments".  In my opinion, that's the most exciting thing
happening in Lucene right now; but it appears stalled a little.

I do think a DocValues based hack could make a better replacement for Solr's
ExternalizableFileField.  It's for use in FunctionQueries.

Another questioner asked essentially why a field that has DocValues won't
have its value shown when the field is marked stored="false" since the value
is stored per-document after all.  True, the disparity here is a bit
confusing.  DocValues are not intended as a replacement for stored fields in
places where you are using stored fields now.  It's basically to improve the
performance and memory use of function queries, sorting, and faceting.  It's
the new FieldCache under a different name, but hasn't strictly replaced the
FC (yet).  It's not enabled by default because it creates new data on disk
and Solr doesn't know that you want to use it.

As of Solr 4.2, DocValues is also multi-valued -- awesome!

All this said, I do think there's room for a proposed Solr DocTransformer to
expose the DocValues value as if it were a stored field in your search
results.  Actually... I wish if you explicitly ask for the field, and it's
not stored, then it would just go use docValues automatically.  That'd be
cool!

~ David


Otis Gospodnetic-5 wrote
> Hi,
> 
> The current field update mechanism is not really a field update
> mechanism.  It just looks like that from the outside.  DocValues
> should make true field updates implementable.
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki &lt;

> mrzewucki@

> &gt; wrote:
>> Hi,
>> Atomic updates (single field updates) do not depend on DocValues. They
>> were
>> implemented in Solr4.0 and works fine (but all fields have to be
>> retrievable). DocValues are supposed to be more efficient than
>> FieldCache.
>> Why not enabled by default ? Maybe because they are not for all fields
>> and
>> because of their limitations (a field has to be single-valued, required
>> or
>> to have default value).
>> Regards.
>>
>>
>>
>> On 29 March 2013 17:20, Timothy Potter &lt;

> thelabdude@

> &gt; wrote:
>>
>>> Hi Jack,
>>>
>>> I've just started to dig into this as well, so sharing what I know but
>>> still some holes in my knowledge too.
>>>
>>> DocValues == Column Stride Fields (best resource I know of so far is
>>> Simon's preso from Lucene Rev 2011 -
>>>
>>> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
>>> ).
>>> It's pretty dense but some nuggets I've gleaned from this are:
>>>
>>> 1) DocValues are more efficient in terms of memory usage and I/O
>>> performance for building an alternative to FieldCache (slide 27 is very
>>> impressive)
>>> 2) DocValues has a more efficient way to store primitive types, such as
>>> packed ints
>>> 3) Faster random access to stored values
>>>
>>> In terms of switch-over, you have to re-index to change your fields to
>>> use
>>> DocValues on disk, which is why they are not enabled by default.
>>>
>>> Lastly, another goal of DocValues is to allow updates to a single field
>>> w/o
>>> re-indexing the entire doc. That's not implemented yet but I think still
>>> planned.
>>>
>>> Cheers,
>>>  Tim
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky &lt;

> jack@

> &gt;> >wrote:
>>>
>>> > I’m still a little fuzzy on DocValues (maybe because I’m still
>>> grappling
>>> > with how it does or doesn’t still relate to “Column Stride Fields”),
>>> so
>>> can
>>> > anybody clue me in as to how useful DocValues is/are?
>>> >
>>> > Are DocValues simply an alternative to “stored fields”?
>>> >
>>> > If so, and if DocValues are so great, why aren’t we just switching
>>> Solr
>>> > over to DocValues under the hood for all fields?
>>> >
>>> > And if there are “issues” with DocValues that would make such a
>>> complete
>>> > switchover less than absolutely desired, what are those issues?
>>> >
>>> > In short, when should a user use DocValues over stored fields, and
>>> vice
>>> > versa?
>>> >
>>> > As things stand, all we’ve done is make Solr more confusing than it
>>> was
>>> > before, without improving its OOBE. OOBE should be job one in Solr.
>>> >
>>> > Thanks.
>>> >
>>> > P.S., And if I actually want to do Column Stride Fields, is there a
>>> way
>>> to
>>> > do that?
>>> >
>>> > -- Jack Krupansky
>>>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/DocValues-vs-stored-fields-tp4052406p4052966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DocValues vs stored fields?

Posted by Marcin Rzewucki <mr...@gmail.com>.
By the way: even if a field has DocValues with "on disk" option enabled it
has to have stored="true" to be retrievable. Why ?


On 29 March 2013 20:51, Otis Gospodnetic <ot...@gmail.com> wrote:

> Hi,
>
> The current field update mechanism is not really a field update
> mechanism.  It just looks like that from the outside.  DocValues
> should make true field updates implementable.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki <mr...@gmail.com>
> wrote:
> > Hi,
> > Atomic updates (single field updates) do not depend on DocValues. They
> were
> > implemented in Solr4.0 and works fine (but all fields have to be
> > retrievable). DocValues are supposed to be more efficient than
> FieldCache.
> > Why not enabled by default ? Maybe because they are not for all fields
> and
> > because of their limitations (a field has to be single-valued, required
> or
> > to have default value).
> > Regards.
> >
> >
> >
> > On 29 March 2013 17:20, Timothy Potter <th...@gmail.com> wrote:
> >
> >> Hi Jack,
> >>
> >> I've just started to dig into this as well, so sharing what I know but
> >> still some holes in my knowledge too.
> >>
> >> DocValues == Column Stride Fields (best resource I know of so far is
> >> Simon's preso from Lucene Rev 2011 -
> >>
> >>
> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
> >> ).
> >> It's pretty dense but some nuggets I've gleaned from this are:
> >>
> >> 1) DocValues are more efficient in terms of memory usage and I/O
> >> performance for building an alternative to FieldCache (slide 27 is very
> >> impressive)
> >> 2) DocValues has a more efficient way to store primitive types, such as
> >> packed ints
> >> 3) Faster random access to stored values
> >>
> >> In terms of switch-over, you have to re-index to change your fields to
> use
> >> DocValues on disk, which is why they are not enabled by default.
> >>
> >> Lastly, another goal of DocValues is to allow updates to a single field
> w/o
> >> re-indexing the entire doc. That's not implemented yet but I think still
> >> planned.
> >>
> >> Cheers,
> >>  Tim
> >>
> >>
> >>
> >> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky <
> jack@basetechnology.com
> >> >wrote:
> >>
> >> > I’m still a little fuzzy on DocValues (maybe because I’m still
> grappling
> >> > with how it does or doesn’t still relate to “Column Stride Fields”),
> so
> >> can
> >> > anybody clue me in as to how useful DocValues is/are?
> >> >
> >> > Are DocValues simply an alternative to “stored fields”?
> >> >
> >> > If so, and if DocValues are so great, why aren’t we just switching
> Solr
> >> > over to DocValues under the hood for all fields?
> >> >
> >> > And if there are “issues” with DocValues that would make such a
> complete
> >> > switchover less than absolutely desired, what are those issues?
> >> >
> >> > In short, when should a user use DocValues over stored fields, and
> vice
> >> > versa?
> >> >
> >> > As things stand, all we’ve done is make Solr more confusing than it
> was
> >> > before, without improving its OOBE. OOBE should be job one in Solr.
> >> >
> >> > Thanks.
> >> >
> >> > P.S., And if I actually want to do Column Stride Fields, is there a
> way
> >> to
> >> > do that?
> >> >
> >> > -- Jack Krupansky
> >>
>

Re: DocValues vs stored fields?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

The current field update mechanism is not really a field update
mechanism.  It just looks like that from the outside.  DocValues
should make true field updates implementable.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki <mr...@gmail.com> wrote:
> Hi,
> Atomic updates (single field updates) do not depend on DocValues. They were
> implemented in Solr4.0 and works fine (but all fields have to be
> retrievable). DocValues are supposed to be more efficient than FieldCache.
> Why not enabled by default ? Maybe because they are not for all fields and
> because of their limitations (a field has to be single-valued, required or
> to have default value).
> Regards.
>
>
>
> On 29 March 2013 17:20, Timothy Potter <th...@gmail.com> wrote:
>
>> Hi Jack,
>>
>> I've just started to dig into this as well, so sharing what I know but
>> still some holes in my knowledge too.
>>
>> DocValues == Column Stride Fields (best resource I know of so far is
>> Simon's preso from Lucene Rev 2011 -
>>
>> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
>> ).
>> It's pretty dense but some nuggets I've gleaned from this are:
>>
>> 1) DocValues are more efficient in terms of memory usage and I/O
>> performance for building an alternative to FieldCache (slide 27 is very
>> impressive)
>> 2) DocValues has a more efficient way to store primitive types, such as
>> packed ints
>> 3) Faster random access to stored values
>>
>> In terms of switch-over, you have to re-index to change your fields to use
>> DocValues on disk, which is why they are not enabled by default.
>>
>> Lastly, another goal of DocValues is to allow updates to a single field w/o
>> re-indexing the entire doc. That's not implemented yet but I think still
>> planned.
>>
>> Cheers,
>>  Tim
>>
>>
>>
>> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky <jack@basetechnology.com
>> >wrote:
>>
>> > I’m still a little fuzzy on DocValues (maybe because I’m still grappling
>> > with how it does or doesn’t still relate to “Column Stride Fields”), so
>> can
>> > anybody clue me in as to how useful DocValues is/are?
>> >
>> > Are DocValues simply an alternative to “stored fields”?
>> >
>> > If so, and if DocValues are so great, why aren’t we just switching Solr
>> > over to DocValues under the hood for all fields?
>> >
>> > And if there are “issues” with DocValues that would make such a complete
>> > switchover less than absolutely desired, what are those issues?
>> >
>> > In short, when should a user use DocValues over stored fields, and vice
>> > versa?
>> >
>> > As things stand, all we’ve done is make Solr more confusing than it was
>> > before, without improving its OOBE. OOBE should be job one in Solr.
>> >
>> > Thanks.
>> >
>> > P.S., And if I actually want to do Column Stride Fields, is there a way
>> to
>> > do that?
>> >
>> > -- Jack Krupansky
>>

Re: DocValues vs stored fields?

Posted by Marcin Rzewucki <mr...@gmail.com>.
Hi,
Atomic updates (single field updates) do not depend on DocValues. They were
implemented in Solr4.0 and works fine (but all fields have to be
retrievable). DocValues are supposed to be more efficient than FieldCache.
Why not enabled by default ? Maybe because they are not for all fields and
because of their limitations (a field has to be single-valued, required or
to have default value).
Regards.



On 29 March 2013 17:20, Timothy Potter <th...@gmail.com> wrote:

> Hi Jack,
>
> I've just started to dig into this as well, so sharing what I know but
> still some holes in my knowledge too.
>
> DocValues == Column Stride Fields (best resource I know of so far is
> Simon's preso from Lucene Rev 2011 -
>
> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
> ).
> It's pretty dense but some nuggets I've gleaned from this are:
>
> 1) DocValues are more efficient in terms of memory usage and I/O
> performance for building an alternative to FieldCache (slide 27 is very
> impressive)
> 2) DocValues has a more efficient way to store primitive types, such as
> packed ints
> 3) Faster random access to stored values
>
> In terms of switch-over, you have to re-index to change your fields to use
> DocValues on disk, which is why they are not enabled by default.
>
> Lastly, another goal of DocValues is to allow updates to a single field w/o
> re-indexing the entire doc. That's not implemented yet but I think still
> planned.
>
> Cheers,
>  Tim
>
>
>
> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky <jack@basetechnology.com
> >wrote:
>
> > I’m still a little fuzzy on DocValues (maybe because I’m still grappling
> > with how it does or doesn’t still relate to “Column Stride Fields”), so
> can
> > anybody clue me in as to how useful DocValues is/are?
> >
> > Are DocValues simply an alternative to “stored fields”?
> >
> > If so, and if DocValues are so great, why aren’t we just switching Solr
> > over to DocValues under the hood for all fields?
> >
> > And if there are “issues” with DocValues that would make such a complete
> > switchover less than absolutely desired, what are those issues?
> >
> > In short, when should a user use DocValues over stored fields, and vice
> > versa?
> >
> > As things stand, all we’ve done is make Solr more confusing than it was
> > before, without improving its OOBE. OOBE should be job one in Solr.
> >
> > Thanks.
> >
> > P.S., And if I actually want to do Column Stride Fields, is there a way
> to
> > do that?
> >
> > -- Jack Krupansky
>

Re: DocValues vs stored fields?

Posted by Timothy Potter <th...@gmail.com>.
Hi Jack,

I've just started to dig into this as well, so sharing what I know but
still some holes in my knowledge too.

DocValues == Column Stride Fields (best resource I know of so far is
Simon's preso from Lucene Rev 2011 -
http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues).
It's pretty dense but some nuggets I've gleaned from this are:

1) DocValues are more efficient in terms of memory usage and I/O
performance for building an alternative to FieldCache (slide 27 is very
impressive)
2) DocValues has a more efficient way to store primitive types, such as
packed ints
3) Faster random access to stored values

In terms of switch-over, you have to re-index to change your fields to use
DocValues on disk, which is why they are not enabled by default.

Lastly, another goal of DocValues is to allow updates to a single field w/o
re-indexing the entire doc. That's not implemented yet but I think still
planned.

Cheers,
 Tim



On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> I’m still a little fuzzy on DocValues (maybe because I’m still grappling
> with how it does or doesn’t still relate to “Column Stride Fields”), so can
> anybody clue me in as to how useful DocValues is/are?
>
> Are DocValues simply an alternative to “stored fields”?
>
> If so, and if DocValues are so great, why aren’t we just switching Solr
> over to DocValues under the hood for all fields?
>
> And if there are “issues” with DocValues that would make such a complete
> switchover less than absolutely desired, what are those issues?
>
> In short, when should a user use DocValues over stored fields, and vice
> versa?
>
> As things stand, all we’ve done is make Solr more confusing than it was
> before, without improving its OOBE. OOBE should be job one in Solr.
>
> Thanks.
>
> P.S., And if I actually want to do Column Stride Fields, is there a way to
> do that?
>
> -- Jack Krupansky