You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Romi <ro...@gmail.com> on 2011/07/01 12:43:41 UTC
MergerFactor and MaxMergerDocs effecting num of segments created
My indexes are these, i want to see the effect of merge factor and maxmerge
docs. on These indexes how can i do it.
*
_0.fdt 3310 KB
_0.fdx 23 KB
_0.fnm 1 KB
_0.frq 857 KB
_0.nrm 31 KB
_0.prx 1748 KB
_0.tii 5 KB
_0.tis 350 Kb*
I mean what test cases for mergefactor and maxmergedoc i can run to see the
effect on indexed files. current configuration is:
*
<mergeFactor>2</mergeFactor>
<maxMergeDocs>10</maxMergeDocs>*
-----
Thanks & Regards
Romi
--
View this message in context: http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3128897.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: MergerFactor and MaxMergerDocs effecting num of segments created
Posted by Shawn Heisey <so...@elyograg.org>.
On 7/4/2011 12:51 AM, Romi wrote:
> Shawn when i reindex data using full-import i got:
> *_0.fdt 3310
> _0.fdx 23
> _0.frq 857
> _0.nrm 31
> _0.prx 1748
> _0.tis 350
> _1.fdt 3310
> _1.fdx 23
> _1.fnm 1
> _1.frq 857
> _1.nrm 31
> _1.prx 1748
> _1.tii 5
> _1.tis 350
> segments.gen 1
> segments_3 1*
>
> Where all _1 marked as archived(A)
>
> And when i run again full import(for testing ) i got _1 and 2_ files where
> all 2_ marked as archive. What does it mean.
> and the problem i am not getting is while i am doing full import which
> deletes the old indexes and creates new than why i m getting the old one
> again.
By mentioning the Archive bit, it sounds like you are running on
Windows. I've only run it on Linux, but I understand from reading
messages on this list that there are a lot of problems on Windows with
deleting old files whenever you do anything that results in old segments
going away -- reindex, optimize, replication, normal segment merging,
etc. The current solr version is 3.3, previous versions are 3.2, 3.1,
then 1.4.1. Others will have to comment about whether things have
improved in more recent releases.
The archive bit is simply a DOS/Windows attribute that says "this file
needs to be backed up." When you create or modify a file in a normal
way, it is turned on. Normally the only thing that turns that bit off
is backup software, but Solr might be programmed to clear it on files
that are no longer needed, in case the delete fails, so there's a way to
detect that they should not be backed up. I don't know if this is
right, it's just speculation.
Thanks,
Shawn
Re: MergerFactor and MaxMergerDocs effecting num of segments
created
Posted by Romi <ro...@gmail.com>.
Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx 23
_0.frq 857
_0.nrm 31
_0.prx 1748
_0.tis 350
_1.fdt 3310
_1.fdx 23
_1.fnm 1
_1.frq 857
_1.nrm 31
_1.prx 1748
_1.tii 5
_1.tis 350
segments.gen 1
segments_3 1*
Where all _1 marked as archived(A)
And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.
-----
Thanks & Regards
Romi
--
View this message in context: http://lucene.472066.n3.nabble.com/MergerFactor-and-MaxMergerDocs-effecting-num-of-segments-created-tp3128897p3136664.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: MergerFactor and MaxMergerDocs effecting num of segments created
Posted by Shawn Heisey <so...@elyograg.org>.
On 7/1/2011 4:43 AM, Romi wrote:
> My indexes are these, i want to see the effect of merge factor and maxmerge
> docs. on These indexes how can i do it.
> *
> _0.fdt 3310 KB
> _0.fdx 23 KB
> _0.fnm 1 KB
> _0.frq 857 KB
> _0.nrm 31 KB
> _0.prx 1748 KB
> _0.tii 5 KB
> _0.tis 350 Kb*
>
> I mean what test cases for mergefactor and maxmergedoc i can run to see the
> effect on indexed files. current configuration is:
> *
> <mergeFactor>2</mergeFactor>
> <maxMergeDocs>10</maxMergeDocs>*
That is a single index segment, and as it's the initial segment (_0), no
optimization or merging has taken place. Further segments would have
the same file extensions with prefixes like _1, _2, etc. Once you
reached _z, the next segment would be _10.
Your index is very small, so small that it only needs one segment when
it is built all at once. If you were to add new documents to the index
(rather than do a full reindex), those new documents would go into a new
segment. If you continue to add segments in this way, this is when
mergeFactor comes into play -- when the number of original segments
reaches this value, they are merged into a single larger segment. When
this continues and you have enough merged segments, they are merged into
an even larger segment. I believe that a mergeFactor of 2 is special,
designed to keep a large starting segment untouched while merging all
the rest, but I have not confirmed that myself.
I don't know why maxMergeDocs is not taking effect. It could be that
during initial indexing, other factors (like ramBufferSizeMB) are
involved, and maxMergeDocs only takes effect when merging existing segments.
For comparison purposes, here are the first three segments from one of
my indexes:
-rw-r--r-- 1 ncindex ncindex 6323043528 Jun 30 00:57 _lf.fdt
-rw-r--r-- 1 ncindex ncindex 75766484 Jun 30 00:57 _lf.fdx
-rw-r--r-- 1 ncindex ncindex 382 Jun 30 00:55 _lf.fnm
-rw-r--r-- 1 ncindex ncindex 2833619259 Jun 30 01:04 _lf.frq
-rw-r--r-- 1 ncindex ncindex 28412434 Jun 30 01:05 _lf.nrm
-rw-r--r-- 1 ncindex ncindex 1183860 Jun 30 15:41 _lf_o.del
-rw-r--r-- 1 ncindex ncindex 2455819068 Jun 30 01:04 _lf.prx
-rw-r--r-- 1 ncindex ncindex 23759599 Jun 30 01:04 _lf.tii
-rw-r--r-- 1 ncindex ncindex 926422435 Jun 30 01:04 _lf.tis
-rw-r--r-- 1 ncindex ncindex 18940740 Jun 30 01:06 _lf.tvd
-rw-r--r-- 1 ncindex ncindex 5883186438 Jun 30 01:06 _lf.tvf
-rw-r--r-- 1 ncindex ncindex 151532964 Jun 30 01:06 _lf.tvx
-rw-r--r-- 1 ncindex ncindex 868769283 Jul 1 09:07 _mf.fdt
-rw-r--r-- 1 ncindex ncindex 11279356 Jul 1 09:07 _mf.fdx
-rw-r--r-- 1 ncindex ncindex 372 Jul 1 09:06 _mf.fnm
-rw-r--r-- 1 ncindex ncindex 347906214 Jul 1 09:08 _mf.frq
-rw-r--r-- 1 ncindex ncindex 4229761 Jul 1 09:08 _mf.nrm
-rw-r--r-- 1 ncindex ncindex 284701250 Jul 1 09:08 _mf.prx
-rw-r--r-- 1 ncindex ncindex 960052 Jul 1 09:08 _mf.tii
-rw-r--r-- 1 ncindex ncindex 141775812 Jul 1 09:08 _mf.tis
-rw-r--r-- 1 ncindex ncindex 2818958 Jul 1 09:08 _mf.tvd
-rw-r--r-- 1 ncindex ncindex 735319599 Jul 1 09:08 _mf.tvf
-rw-r--r-- 1 ncindex ncindex 22558708 Jul 1 09:08 _mf.tvx
-rw-r--r-- 1 ncindex ncindex 30888748 Jul 1 09:07 _mg.fdt
-rw-r--r-- 1 ncindex ncindex 385700 Jul 1 09:07 _mg.fdx
-rw-r--r-- 1 ncindex ncindex 372 Jul 1 09:07 _mg.fnm
-rw-r--r-- 1 ncindex ncindex 13709508 Jul 1 09:07 _mg.frq
-rw-r--r-- 1 ncindex ncindex 144640 Jul 1 09:07 _mg.nrm
-rw-r--r-- 1 ncindex ncindex 12683152 Jul 1 09:07 _mg.prx
-rw-r--r-- 1 ncindex ncindex 51848 Jul 1 09:07 _mg.tii
-rw-r--r-- 1 ncindex ncindex 7409698 Jul 1 09:07 _mg.tis
-rw-r--r-- 1 ncindex ncindex 96428 Jul 1 09:07 _mg.tvd
-rw-r--r-- 1 ncindex ncindex 31790084 Jul 1 09:07 _mg.tvf
-rw-r--r-- 1 ncindex ncindex 771396 Jul 1 09:07 _mg.tvx
Shawn