You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avi Steiner <as...@varonis.com> on 2017/04/25 09:33:14 UTC

Huge cfs files

Hi

We have a customer with Solr 5.3.1.
The index contains less than 3.5 million docs, and index folder size is about 240GB.
I found that the most huge files are .cfs files (compound files) that were created lately although only few documents were added.
The useCompoundFile parameter is commented in SolrConfig.xml.
As far as I understand the default of Solr is false, and of Lucene is true, which means this feature should be disabled.
I would like to understand why those files created and why they are so huge.

Regards,

Avi


________________________________
This email and any attachments thereto may contain private, confidential, and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

Re: Huge cfs files

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/25/2017 3:33 AM, Avi Steiner wrote:
> We have a customer with Solr 5.3.1.
> The index contains less than 3.5 million docs, and index folder size is about 240GB.

If 3.5 million documents creates a 240GB index, then this is a very
atypical index.  The documents must be HUGE, or else you are using
copyField a LOT to create different ways to search the same data.  My
largest index shards have nearly 40 million documents in them and are
only 55GB in size.  The entire distributed index is almost 400 million
docs and about 550GB in size.

> I found that the most huge files are .cfs files (compound files) that were created lately although only few documents were added.
> The useCompoundFile parameter is commented in SolrConfig.xml.
> As far as I understand the default of Solr is false, and of Lucene is true, which means this feature should be disabled.
> I would like to understand why those files created and why they are so huge.

The cfs files are not an indication of a problem.  Solr (Lucene really)
has decided that some threshold has been crossed for those segments, and
that it should consolidate the files instead of keeping them separate.

The reason they are so big is because your index is big.  No other
reason.  The total disk space consumed would be pretty much identical
even if the files were separate instead of combined into a .cfs file. 
Think of the .cfs file as a little bit like a .tar file, or a .zip file
without compression.  Because segment files are never changed after
creation, there's very little difference between accessing part of a
large file instead of an individual file.

Thanks,
Shawn