You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jie Sun <js...@yahoo.com> on 2012/12/11 19:00:52 UTC

suggestion howto handle highly repetitive valued field

Hi -
our indexed documents currently store solr fields like 'digest' or 'type',
which most of our documents will end up with same value (such as 'sha1' for
field 'digest', or 'message' for field 'type' etc).

on each solr server, we usually have 100 of millions of documents indexed
and with the same value on these fields (fields are stored and indexed).

any suggestion what is the  best approach if we suspect this will be very
inefficient on disk space usage, or is it?

thanks!
Jie



--
View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggestion howto handle highly repetitive valued field

Posted by Jie Sun <js...@yahoo.com>.
thank you David!



--
View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104p4026163.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggestion howto handle highly repetitive valued field

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
The indexed="true" side is quite efficient.  The stored="true" side -- not so
much, but the strings you have here are pretty small and I wouldn't worry
about it.  Solr 4.1 (unreleased) does a great job here and compresses all
the stored field data across documents.

~ David


Jie Sun wrote
> Hi -
> our indexed documents currently store solr fields like 'digest' or 'type',
> which most of our documents will end up with same value (such as 'sha1'
> for field 'digest', or 'message' for field 'type' etc).
> 
> on each solr server, we usually have 100 of millions of documents indexed
> and with the same value on these fields (fields are stored and indexed).
> 
> any suggestion what is the  best approach if we suspect this will be very
> inefficient on disk space usage, or is it?
> 
> thanks!
> Jie





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104p4026131.html
Sent from the Solr - User mailing list archive at Nabble.com.