You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Romi <ro...@gmail.com> on 2011/07/01 12:43:41 UTC

MergerFactor and MaxMergerDocs effecting num of segments created

My indexes are these, i want to see the effect of merge factor and maxmerge
docs. on These indexes how can i do it.
*
_0.fdt  3310 KB
_0.fdx  23 KB
_0.fnm  1 KB
_0.frq  857 KB
_0.nrm  31 KB
_0.prx  1748 KB
_0.tii  5 KB
_0.tis  350 Kb*

I mean what test cases for mergefactor and maxmergedoc i can run to see the
effect on indexed files. current configuration is:
*
<mergeFactor>2</mergeFactor>
 <maxMergeDocs>10</maxMergeDocs>*



-----
Thanks & Regards
Romi
--
View this message in context: http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3128897.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MergerFactor and MaxMergerDocs effecting num of segments created

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/4/2011 12:51 AM, Romi wrote:
> Shawn when i reindex data using full-import i got:
> *_0.fdt		3310
> _0.fdx		23
> _0.frq		        857
> _0.nrm		31
> _0.prx		1748
> _0.tis		        350
> _1.fdt		3310
> _1.fdx		23
> _1.fnm		1
> _1.frq		        857
> _1.nrm		31
> _1.prx		1748
> _1.tii		        5
> _1.tis		        350
> segments.gen	1
> segments_3	        1*
>
> Where all  _1  marked as archived(A)
>
> And when i run again full import(for testing ) i got _1 and 2_ files where
> all 2_ marked as archive. What does it mean.
> and the problem i am not getting is while i am doing full import which
> deletes the old indexes and creates new than why i m getting the old one
> again.

By mentioning the Archive bit, it sounds like you are running on 
Windows.  I've only run it on Linux, but I understand from reading 
messages on this list that there are a lot of problems on Windows with 
deleting old files whenever you do anything that results in old segments 
going away -- reindex, optimize, replication, normal segment merging, 
etc.  The current solr version is 3.3, previous versions are 3.2, 3.1, 
then 1.4.1.  Others will have to comment about whether things have 
improved in more recent releases.

The archive bit is simply a DOS/Windows attribute that says "this file 
needs to be backed up."  When you create or modify a file in a normal 
way, it is turned on.  Normally the only thing that turns that bit off 
is backup software, but Solr might be programmed to clear it on files 
that are no longer needed, in case the delete fails, so there's a way to 
detect that they should not be backed up.  I don't know if this is 
right, it's just speculation.

Thanks,
Shawn


Re: MergerFactor and MaxMergerDocs effecting num of segments created

Posted by Romi <ro...@gmail.com>.
Shawn when i reindex data using full-import i got:
*_0.fdt		3310
_0.fdx		23
_0.frq		        857
_0.nrm		31
_0.prx		1748
_0.tis		        350
_1.fdt		3310
_1.fdx		23
_1.fnm		1
_1.frq		        857
_1.nrm		31
_1.prx		1748
_1.tii		        5
_1.tis		        350
segments.gen	1
segments_3	        1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.




-----
Thanks & Regards
Romi
--
View this message in context: http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3136664.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MergerFactor and MaxMergerDocs effecting num of segments created

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/1/2011 4:43 AM, Romi wrote:
> My indexes are these, i want to see the effect of merge factor and maxmerge
> docs. on These indexes how can i do it.
> *
> _0.fdt  3310 KB
> _0.fdx  23 KB
> _0.fnm  1 KB
> _0.frq  857 KB
> _0.nrm  31 KB
> _0.prx  1748 KB
> _0.tii  5 KB
> _0.tis  350 Kb*
>
> I mean what test cases for mergefactor and maxmergedoc i can run to see the
> effect on indexed files. current configuration is:
> *
> <mergeFactor>2</mergeFactor>
>   <maxMergeDocs>10</maxMergeDocs>*

That is a single index segment, and as it's the initial segment (_0), no 
optimization or merging has taken place.  Further segments would have 
the same file extensions with prefixes like _1, _2, etc.  Once you 
reached _z, the next segment would be _10.

Your index is very small, so small that it only needs one segment when 
it is built all at once.  If you were to add new documents to the index 
(rather than do a full reindex), those new documents would go into a new 
segment.  If you continue to add segments in this way, this is when 
mergeFactor comes into play -- when the number of original segments 
reaches this value, they are merged into a single larger segment.  When 
this continues and you have enough merged segments, they are merged into 
an even larger segment.  I believe that a mergeFactor of 2 is special, 
designed to keep a large starting segment untouched while merging all 
the rest, but I have not confirmed that myself.

I don't know why maxMergeDocs is not taking effect.  It could be that 
during initial indexing, other factors (like ramBufferSizeMB) are 
involved, and maxMergeDocs only takes effect when merging existing segments.

For comparison purposes, here are the first three segments from one of 
my indexes:

-rw-r--r-- 1 ncindex ncindex 6323043528 Jun 30 00:57 _lf.fdt
-rw-r--r-- 1 ncindex ncindex   75766484 Jun 30 00:57 _lf.fdx
-rw-r--r-- 1 ncindex ncindex        382 Jun 30 00:55 _lf.fnm
-rw-r--r-- 1 ncindex ncindex 2833619259 Jun 30 01:04 _lf.frq
-rw-r--r-- 1 ncindex ncindex   28412434 Jun 30 01:05 _lf.nrm
-rw-r--r-- 1 ncindex ncindex    1183860 Jun 30 15:41 _lf_o.del
-rw-r--r-- 1 ncindex ncindex 2455819068 Jun 30 01:04 _lf.prx
-rw-r--r-- 1 ncindex ncindex   23759599 Jun 30 01:04 _lf.tii
-rw-r--r-- 1 ncindex ncindex  926422435 Jun 30 01:04 _lf.tis
-rw-r--r-- 1 ncindex ncindex   18940740 Jun 30 01:06 _lf.tvd
-rw-r--r-- 1 ncindex ncindex 5883186438 Jun 30 01:06 _lf.tvf
-rw-r--r-- 1 ncindex ncindex  151532964 Jun 30 01:06 _lf.tvx
-rw-r--r-- 1 ncindex ncindex  868769283 Jul  1 09:07 _mf.fdt
-rw-r--r-- 1 ncindex ncindex   11279356 Jul  1 09:07 _mf.fdx
-rw-r--r-- 1 ncindex ncindex        372 Jul  1 09:06 _mf.fnm
-rw-r--r-- 1 ncindex ncindex  347906214 Jul  1 09:08 _mf.frq
-rw-r--r-- 1 ncindex ncindex    4229761 Jul  1 09:08 _mf.nrm
-rw-r--r-- 1 ncindex ncindex  284701250 Jul  1 09:08 _mf.prx
-rw-r--r-- 1 ncindex ncindex     960052 Jul  1 09:08 _mf.tii
-rw-r--r-- 1 ncindex ncindex  141775812 Jul  1 09:08 _mf.tis
-rw-r--r-- 1 ncindex ncindex    2818958 Jul  1 09:08 _mf.tvd
-rw-r--r-- 1 ncindex ncindex  735319599 Jul  1 09:08 _mf.tvf
-rw-r--r-- 1 ncindex ncindex   22558708 Jul  1 09:08 _mf.tvx
-rw-r--r-- 1 ncindex ncindex   30888748 Jul  1 09:07 _mg.fdt
-rw-r--r-- 1 ncindex ncindex     385700 Jul  1 09:07 _mg.fdx
-rw-r--r-- 1 ncindex ncindex        372 Jul  1 09:07 _mg.fnm
-rw-r--r-- 1 ncindex ncindex   13709508 Jul  1 09:07 _mg.frq
-rw-r--r-- 1 ncindex ncindex     144640 Jul  1 09:07 _mg.nrm
-rw-r--r-- 1 ncindex ncindex   12683152 Jul  1 09:07 _mg.prx
-rw-r--r-- 1 ncindex ncindex      51848 Jul  1 09:07 _mg.tii
-rw-r--r-- 1 ncindex ncindex    7409698 Jul  1 09:07 _mg.tis
-rw-r--r-- 1 ncindex ncindex      96428 Jul  1 09:07 _mg.tvd
-rw-r--r-- 1 ncindex ncindex   31790084 Jul  1 09:07 _mg.tvf
-rw-r--r-- 1 ncindex ncindex     771396 Jul  1 09:07 _mg.tvx

Shawn