You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kshitij tyagi <ks...@gmail.com> on 2016/08/16 17:47:12 UTC

Need to understand solr merging and commit relationship

I need to understand clearly that is there any relationship between solr
merging and solr commit?

If there is then what is it?

Also i need to understand how both of these affect indexing speed on the
core?

Re: Need to understand solr merging and commit relationship

Posted by kshitij tyagi <ks...@gmail.com>.
 i have 2 solr cores on a machine with same configs.

Problem is I am getting faster indexing speed on core1 and slower on core2.

Both cores have same index size and configuration.

On Tue, Aug 16, 2016 at 11:34 PM, Erick Erickson <er...@gmail.com>
wrote:

> Why? What is the problem you're facing that you hope
> understanding more about these will help?
>
> Here are two places to start:
> http://blog.mikemccandless.com/2011/02/visualizing-
> lucenes-segment-merges.html
> https://lucidworks.com/blog/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> In general every time you do a hard commit the Lucene index is checked
> to see if there are segments that should be merged. If so, then a
> background
> thread is kicked off to start merging selected segments. Which segments
> is decided by the MergePolicy in effect (TieredMergePolicy is the default).
>
> Best,
> Erick
>
> On Tue, Aug 16, 2016 at 10:47 AM, kshitij tyagi
> <ks...@gmail.com> wrote:
> > I need to understand clearly that is there any relationship between solr
> > merging and solr commit?
> >
> > If there is then what is it?
> >
> > Also i need to understand how both of these affect indexing speed on the
> > core?
>

Re: Need to understand solr merging and commit relationship

Posted by Erick Erickson <er...@gmail.com>.
Why? What is the problem you're facing that you hope
understanding more about these will help?

Here are two places to start:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

In general every time you do a hard commit the Lucene index is checked
to see if there are segments that should be merged. If so, then a background
thread is kicked off to start merging selected segments. Which segments
is decided by the MergePolicy in effect (TieredMergePolicy is the default).

Best,
Erick

On Tue, Aug 16, 2016 at 10:47 AM, kshitij tyagi
<ks...@gmail.com> wrote:
> I need to understand clearly that is there any relationship between solr
> merging and solr commit?
>
> If there is then what is it?
>
> Also i need to understand how both of these affect indexing speed on the
> core?

Re: Need to understand solr merging and commit relationship

Posted by kshitij tyagi <ks...@gmail.com>.
thanks shawn that was really helpful

On Sat, Aug 20, 2016 at 3:17 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/16/2016 11:47 AM, kshitij tyagi wrote:
> > I need to understand clearly that is there any relationship between solr
> > merging and solr commit?
> >
> > If there is then what is it?
> >
> > Also i need to understand how both of these affect indexing speed on the
> > core?
>
> Whenever a new segment is written, the merge policy is checked to see
> whether a merge is needed.  If it is needed, then the merge is scheduled.
>
> A commit operation can (and frequently does) write a new segment, but
> that is not the only thing that can write (flush) new segments.  When
> the indexing RAM buffer fills up, a segment will be flushed, even
> without a commit.
>
> When paired with the default NRT Directory implementation, soft commits
> change the dynamics slightly, but not the way things generally operate.
> Soft commits are capable of flushing the latest segment(s) to memory,
> instead of the disk, but only if they are quite small.
>
> I would not expect commits to *directly* affect indexing speed unless
> you are doing commits extremely frequently.  Commits might indirectly
> affect indexing speed if they trigger a large merge.
>
> Merging can cause issues with indexing speed, even if it's happening in
> a different Solr core on the same machine.  This is because the system
> resources (I/O bandwidth, memory, CPU) required for a merge are also
> required to write a new segment.  Also, because flushing a new segment
> is effectively the same operation as the writing part of a merge, if too
> many merges are scheduled at once on a core, indexing on that core can
> stop entirely until the number of scheduled merges drops.
>
> Merging can also cause issues with query speed, if there is not
> sufficient memory available to the OS for effective disk caching.
>
> Thanks,
> Shawn
>
>

Re: Need to understand solr merging and commit relationship

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/16/2016 11:47 AM, kshitij tyagi wrote:
> I need to understand clearly that is there any relationship between solr
> merging and solr commit?
>
> If there is then what is it?
>
> Also i need to understand how both of these affect indexing speed on the
> core?

Whenever a new segment is written, the merge policy is checked to see
whether a merge is needed.  If it is needed, then the merge is scheduled.

A commit operation can (and frequently does) write a new segment, but
that is not the only thing that can write (flush) new segments.  When
the indexing RAM buffer fills up, a segment will be flushed, even
without a commit.

When paired with the default NRT Directory implementation, soft commits
change the dynamics slightly, but not the way things generally operate. 
Soft commits are capable of flushing the latest segment(s) to memory,
instead of the disk, but only if they are quite small.

I would not expect commits to *directly* affect indexing speed unless
you are doing commits extremely frequently.  Commits might indirectly
affect indexing speed if they trigger a large merge.

Merging can cause issues with indexing speed, even if it's happening in
a different Solr core on the same machine.  This is because the system
resources (I/O bandwidth, memory, CPU) required for a merge are also
required to write a new segment.  Also, because flushing a new segment
is effectively the same operation as the writing part of a merge, if too
many merges are scheduled at once on a core, indexing on that core can
stop entirely until the number of scheduled merges drops.

Merging can also cause issues with query speed, if there is not
sufficient memory available to the OS for effective disk caching.

Thanks,
Shawn