You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jordan Drake <jo...@exterro.com> on 2017/02/24 03:01:35 UTC

Index Segments not Merging

We have solr with the index stored in HDFS. We are running MapReduce jobs
to build the index using the MapReduceIndexerTool from Cloudera with the
go-live option to merge into our live index.

We are seeing an issue where the number of segments in the index never
reduces. It continues to grow until we manually do an optimize.

We are using the following solr config for merge policy











*<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">    <int
name="maxMergeAtOnce">10</int>    <int
name="segmentsPerTier">10</int></mergePolicy><!--<mergeFactor>10</mergeFactor>--><mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">    <int
name="maxThreadCount">1</int>    <int
name="maxMergeCount">6</int></mergeScheduler>*

If we add documents into solr without using MapReduce the segments merge
properly as expected.

Any ideas on why we see this behavior? Does the solr index merge prevent
the segments from merging?


Thanks,
Jordan

Re: Index Segments not Merging

Posted by Mike Thomsen <mi...@gmail.com>.
Just barely skimmed the documentation, but it looks like the tool generates
its own shards and pushes them into the collection by manipulating the
configuration of the cluster.

https://www.cloudera.com/documentation/enterprise/5-8-x/topics/search_mapreduceindexertool.html

If that reading is correct, it would stand to reason that Solr (at least as
of Solr 4.10 which is what CDH ships) would not be doing the periodic
cleanup it normally does when building shards through its APIs.

On Thu, Feb 23, 2017 at 10:01 PM, Jordan Drake <jo...@exterro.com>
wrote:

> We have solr with the index stored in HDFS. We are running MapReduce jobs
> to build the index using the MapReduceIndexerTool from Cloudera with the
> go-live option to merge into our live index.
>
> We are seeing an issue where the number of segments in the index never
> reduces. It continues to grow until we manually do an optimize.
>
> We are using the following solr config for merge policy
>
>
>
>
>
>
>
>
>
>
>
> *<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">    <int
> name="maxMergeAtOnce">10</int>    <int
> name="segmentsPerTier">10</int></mergePolicy><!--<
> mergeFactor>10</mergeFactor>--><mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">    <int
> name="maxThreadCount">1</int>    <int
> name="maxMergeCount">6</int></mergeScheduler>*
>
> If we add documents into solr without using MapReduce the segments merge
> properly as expected.
>
> Any ideas on why we see this behavior? Does the solr index merge prevent
> the segments from merging?
>
>
> Thanks,
> Jordan
>