You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "David Smiley (@MITRE.org)" <DS...@mitre.org> on 2013/03/08 20:02:30 UTC

DiskDocValues vs Lucene42Codec

DiskDocValues is a codec (or part of a codec, apparenlty) for accessing the
DocValues from disk, with minimal RAM usage for things like offsets. 
Lucene42Codec alternatively puts all of DocValues in RAM.  Is the actual
disk resident data format the same between them?  And how do you pick &
choose the formats?  i.e. can I use Lucene42Codec for all the non-DocValues
stuff but then use DiskDocValues so that I can let the OS's cache govern
access to DV data while lowering my Java heap and giving the GC a break.  Ok
I'm going to answer the 2nd question as I just discovered
Lucene42Codec.getDocValuesFormatForField which I can customize.  But that
still leaves the 1st question.  It would be nice to not have to re-index.

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/DiskDocValues-vs-Lucene42Codec-tp4044061p4045871.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: DiskDocValues vs Lucene42Codec

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Thanks Robert; that's very helpful.



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/DiskDocValues-vs-Lucene42Codec-tp4044061p4045935.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: DiskDocValues vs Lucene42Codec

Posted by Robert Muir <rc...@gmail.com>.
The underlying data formats are different. For example, because
Lucene42Codec will load terms into RAM, it uses an FST. But DiskDV
uses a more simplistic storage for the terms thats more suitable for
being disk-resident.

There are also different compression block sizes and so on in use.

you can pick and choose the formats on a per-field basis just as you
mentioned. In solr its also hooked into schema.xml so you can do
docValuesFormat="Disk" as an element on the field type (similar to
postingsFormat)

On Fri, Mar 8, 2013 at 2:02 PM, David Smiley (@MITRE.org)
<DS...@mitre.org> wrote:
> DiskDocValues is a codec (or part of a codec, apparenlty) for accessing the
> DocValues from disk, with minimal RAM usage for things like offsets.
> Lucene42Codec alternatively puts all of DocValues in RAM.  Is the actual
> disk resident data format the same between them?  And how do you pick &
> choose the formats?  i.e. can I use Lucene42Codec for all the non-DocValues
> stuff but then use DiskDocValues so that I can let the OS's cache govern
> access to DV data while lowering my Java heap and giving the GC a break.  Ok
> I'm going to answer the 2nd question as I just discovered
> Lucene42Codec.getDocValuesFormatForField which I can customize.  But that
> still leaves the 1st question.  It would be nice to not have to re-index.
>
> ~ David
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DiskDocValues-vs-Lucene42Codec-tp4044061p4045871.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org