You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vivek sar <vi...@gmail.com> on 2009/04/23 19:08:20 UTC

Control segment size

Hi,

  Is there any configuration to control the segments' file size in
Solr? Currently, I've an index (70G) with 80 segment files and one of
the file is 24G. We noticed that in some cases commit takes over 2
hours to complete (committing 50K records), whereas usually it
finishes in 20 seconds. After further investigation it turns out the
system was doing lot of paging - the file system buffer was trying to
write back the big segment back to disk. I got 20G memory on system
with 6 G assigned to Solr instance (running 2 instances).

It seems if I can control the segment size to max of 4-5 GB I'll be
ok. Is there any way to do so?

I got merging factor of 100 - does that impacts the size too? Why
different segments have different size?

Thanks,
-vivek

Re: Control segment size

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, May 12, 2009 at 2:30 AM, vivek sar <vi...@gmail.com> wrote:

> Here is what I've read on maxMergeDocs,
>
>  "While merging segments, Lucene will ensure that no segment with more
> than maxMergeDocs is created."
>
>  Wouldn't that mean that no index file should contain more than max
> docs? I guess the index files could also just contain the index
> information which is not limited by any property - is that true?
>

Yes, an individual segment will not contain more than maxMergeDocs number of
documents. But the size of the segment may still vary because some documents
may have more unique tokens than others.

What you saw originally must have been a segment merge which is normal and
happens in the course of indexing. I don't think there's a way to avoid that
other than to have a ridiculously high mergeFactor (which will affect search
performance).

-- 
Regards,
Shalin Shekhar Mangar.

Re: Control segment size

Posted by vivek sar <vi...@gmail.com>.
Shalin,

 Here is what I've read on maxMergeDocs,

 "While merging segments, Lucene will ensure that no segment with more
than maxMergeDocs is created."

 Wouldn't that mean that no index file should contain more than max
docs? I guess the index files could also just contain the index
information which is not limited by any property - is that true?

Is there any work around to limit the index size, beside limiting the
index itself?

Thanks,
-vivek

On Fri, May 8, 2009 at 10:02 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> On Fri, May 8, 2009 at 1:30 AM, vivek sar <vi...@gmail.com> wrote:
>
>>
>> I did set the maxMergeDocs to 10M, but I still see couple of index
>> files over 30G which do not match with max number of documents. Here
>> are some numbers,
>>
>> 1) My total index size = 66GB
>> 2) Number of total documents = 200M
>> 3) 1M doc = 300MB
>> 4) 10M doc should be roughly around 3-4GB.
>>
>> As you can see couple of files are huge. Are those documents or index
>> files? How can I control the file size so no single file grows more
>> than 10GB.
>>
>
> No, there is no way to limit an individual file to a specific size.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Control segment size

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, May 8, 2009 at 1:30 AM, vivek sar <vi...@gmail.com> wrote:

>
> I did set the maxMergeDocs to 10M, but I still see couple of index
> files over 30G which do not match with max number of documents. Here
> are some numbers,
>
> 1) My total index size = 66GB
> 2) Number of total documents = 200M
> 3) 1M doc = 300MB
> 4) 10M doc should be roughly around 3-4GB.
>
> As you can see couple of files are huge. Are those documents or index
> files? How can I control the file size so no single file grows more
> than 10GB.
>

No, there is no way to limit an individual file to a specific size.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Control segment size

Posted by vivek sar <vi...@gmail.com>.
Thanks Otis.

I did set the maxMergeDocs to 10M, but I still see couple of index
files over 30G which do not match with max number of documents. Here
are some numbers,

1) My total index size = 66GB
2) Number of total documents = 200M
3) 1M doc = 300MB
4) 10M doc should be roughly around 3-4GB.

Under the index I see,

-rw-r--r--   1 dssearch  staff  31771545312 May  6 14:15 _2tp.cfs
-rw-r--r--   1 dssearch  staff  31932190573 May  7 08:13 _5ne.cfs
-rw-r--r--   1 dssearch  staff    543118747 May  7 08:32 _5p2.cfs
-rw-r--r--   1 dssearch  staff    543124452 May  7 08:53 _5qr.cfs
-rw-r--r--   1 dssearch  staff    543100201 May  7 09:18 _5sg.cfs
..
..

As you can see couple of files are huge. Are those documents or index
files? How can I control the file size so no single file grows more
than 10GB.

Thanks,
-vivek



On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
>
> Hi,
>
> You are looking for maxMergeDocs, I believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <vi...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, April 23, 2009 1:08:20 PM
>> Subject: Control segment size
>>
>> Hi,
>>
>>   Is there any configuration to control the segments' file size in
>> Solr? Currently, I've an index (70G) with 80 segment files and one of
>> the file is 24G. We noticed that in some cases commit takes over 2
>> hours to complete (committing 50K records), whereas usually it
>> finishes in 20 seconds. After further investigation it turns out the
>> system was doing lot of paging - the file system buffer was trying to
>> write back the big segment back to disk. I got 20G memory on system
>> with 6 G assigned to Solr instance (running 2 instances).
>>
>> It seems if I can control the segment size to max of 4-5 GB I'll be
>> ok. Is there any way to do so?
>>
>> I got merging factor of 100 - does that impacts the size too? Why
>> different segments have different size?
>>
>> Thanks,
>> -vivek
>
>

Re: Control segment size

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

You are looking for maxMergeDocs, I believe.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: vivek sar <vi...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 23, 2009 1:08:20 PM
> Subject: Control segment size
> 
> Hi,
> 
>   Is there any configuration to control the segments' file size in
> Solr? Currently, I've an index (70G) with 80 segment files and one of
> the file is 24G. We noticed that in some cases commit takes over 2
> hours to complete (committing 50K records), whereas usually it
> finishes in 20 seconds. After further investigation it turns out the
> system was doing lot of paging - the file system buffer was trying to
> write back the big segment back to disk. I got 20G memory on system
> with 6 G assigned to Solr instance (running 2 instances).
> 
> It seems if I can control the segment size to max of 4-5 GB I'll be
> ok. Is there any way to do so?
> 
> I got merging factor of 100 - does that impacts the size too? Why
> different segments have different size?
> 
> Thanks,
> -vivek