You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Elaine Cario <et...@gmail.com> on 2018/03/21 22:35:19 UTC

Re: solrcloud Auto-commit doesn't seem reliable

I'm just catching up on reading solr emails, so forgive me for being late
to this dance....

I've just gone through a project to enable CDCR on our Solr, and I also
experienced a small period of time where the commits on the source server
just seemed to stop.  This was during a period of intense experimentation
where I was mucking around with configurations, turning CDCR on/off, etc.
At some point the commits stopped occurring, and it drove me nuts for a
couple of days - tried everything - restarting Solr, reloading, turned
buffering on, turned buffering off, etc.  I finally threw up my hands and
rebooted the server out of desperation (it was a physical Linux box).
Commits worked fine after that.  I don't know what caused the commits to
stop, and why re-booting (and not just restarting Solr) caused them to work
fine.

Wondering if you ever found a solution to your situation?



On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer <we...@sial.com>
wrote:

> I meant to get back to this sooner.
>
> When I say I issued a commit I do issue it as collection/update?commit=true
>
> The soft commit interval is set to 3000, but I don't have a problem with
> soft commits ( I think). I was responding
>
> I am concerned that some hard commits don't seem to happen, but I think
> many commits do occur. I'd like suggestions on how to diagnose this, and
> perhaps an idea of where to look. Typically I believe that issues like this
> are from our configuration.
>
> Our indexing job is pretty simple, we send blocks of JSON to
> <collection>/update/json. We have either re-index the whole collection, or
> just apply updates. Typically we reindex the data once a week and delete
> any records that are older than the last full index. This does lead to a
> fair number of deleted records in the index especially if commits fail.
> Most of our collections are not large between 2 and 3 million records.
>
> The collections are hosted in google cloud
>
> On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > bq: But if 3 seconds is aggressive what would be a  good value for soft
> > commit?
> >
> > The usual answer is "as long as you can stand". All top-level caches are
> > invalidated, autowarming is done etc. on each soft commit. That can be a
> > lot of
> > work and if your users are comfortable with docs not showing up for,
> > say, 10 minutes
> > then use 10 minutes. As always "it depends" here, the point is not to
> > do unnecessary
> > work if possible.
> >
> > bq: If a commit doesn't happen how would there ever be an index merge
> > that would remove the deleted documents.
> >
> > Right, it wouldn't. It's a little more subtle than that though.
> > Segments on various
> > replicas will contain different docs, thus the term/doc statistics can be
> > a bit
> > different between multiple replicas. None of the stats will change
> > until the commit
> > though. You might try turning no distributed doc/term stats though.
> >
> > Your comments about PULL or TLOG replicas are well taken. However, even
> > those
> > won't be absolutely in sync since they'll replicate from the master at
> > slightly
> > different times and _could_ get slightly different segments _if_
> > there's indexing
> > going on. But let's say you stop indexing. After the next poll
> > interval all the replicas
> > will have identical characteristics and will score the docs the same.
> >
> > I don't have any signifiant wisdom to offer here, except this is really
> the
> > first time I've heard of this behavior. About all I can imagine is
> > that _somehow_
> > the soft commit interval is -1. When you say you "issue a commit" I'm
> > assuming
> > it's via ....collection/update?commit=true or some such which issues a
> > hard
> > commit with openSearcher=true. And it's on a _collection_ basis, right?
> >
> > Sorry I can't be more help
> > Erick
> >
> >
> >
> >
> > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <we...@sial.com>
> > wrote:
> > > Erick, I am aware of the CDCR buffering problem causing tlog retention,
> > we
> > > always turn buffering off in our cdcr configurations.
> > >
> > > My post was precipitated by seeing that we had uncommitted data in
> > > collections > 24 hours after it was loaded. The collections I was
> looking
> > > at are in our development environment, where we do not use CDCR.
> However
> > > I'm pretty sure that I've seen situations in production where commits
> > were
> > > also long overdue.
> > >
> > > the "autoSoftcommit" was a typo. The soft commit logic seems to be
> fine,
> > I
> > > don't see an issue with data visibility. But if 3 seconds is aggressive
> > > what would be a  good value for soft commit? We have a couple of
> > > collections that are updated every minute although most of them are
> > updated
> > > much less frequently.
> > >
> > > My reason for raising this commit issue is that we see problems with
> the
> > > relevancy of solrcloud searches, and the NRT replica type. Sometimes
> the
> > > results flip where the best hit varies by what replica serviced the
> > search.
> > > This is hard to explain to management. Doing an optimized does address
> > the
> > > problem for a while. I try to avoid optimizing for the reasons you and
> > Sean
> > > list. If a commit doesn't happen how would there ever be an index merge
> > > that would remove the deleted documents.
> > >
> > > The problem with deletes and relevancy don't seem to occur when we use
> > TLOG
> > > replicas, probably because they don't do their own indexing but get
> > copies
> > > from their leader. We are testing them now eventually we may abandon
> the
> > > use of NRT replicas for most of our collections.
> > >
> > > I am quite concerned about this commit issue. What kinds of things
> would
> > > influence whether a commit occurs? One commonality for our systems is
> > that
> > > they are hosted in a Google cloud. We have a number of collections that
> > > share configurations, but others that do not. I think commits do
> happen,
> > > but I don't trust that autoCommit is reliable. What can we do to make
> it
> > > reliable?
> > >
> > > Most of our collections are reindexed weekly with partial updates
> applied
> > > daily, that at least is what happens in production, our development
> > clouds
> > > are not as regular.
> > >
> > > Our solr startup script sets the following values:
> > > -Dsolr.autoCommit.maxDocs=35000
> > > -Dsolr.autoCommit.maxTime=60000
> > > -Dsolr.autoSoftCommit.maxTime=3000
> > >
> > > I don't think we reference  solr.autoCommit.maxDocs in our
> solrconfig.xml
> > > files.
> > >
> > > here are our settings for autoCommit and autoSoftCommit
> > >
> > > We had a lot of issues with missing commits when we didn't set
> > > solr.autoCommit.maxTime
> > >      <autoCommit>
> > >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> > >        <openSearcher>false</openSearcher>
> > >     </autoCommit>
> > >
> > >      <autoSoftCommit>
> > >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> > >      </autoSoftCommit>
> > >
> > >
> > >
> > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey <ap...@elyograg.org>
> > wrote:
> > >
> > >> On 2/9/2018 9:29 AM, Webster Homer wrote:
> > >>
> > >>> A little more background. Our production Solrclouds are populated via
> > >>> CDCR,
> > >>> CDCR does not replicate commits, Commits to the target clouds happen
> > via
> > >>> autoCommit settings
> > >>>
> > >>> We see relvancy scores get inconsistent when there are too many
> deletes
> > >>> which seems to happen when hard commits don't happen.
> > >>>
> > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer <
> webster.homer@sial.com
> > >
> > >>> wrote:
> > >>>
> > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the
> visibility
> > of
> > >>>> the records that is my primary concern. I am concerned about is the
> > >>>> accumulation of uncommitted tlog files and the larger number of
> > deleted
> > >>>> documents.
> > >>>>
> > >>>
> > >> For the deleted documents:  Have you ever done an optimize on the
> > >> collection?  If so, you're going to need to re-do the optimize
> > regularly to
> > >> keep deleted documents from growing out of control.  See this issue
> for
> > a
> > >> very technical discussion about it:
> > >>
> > >> https://issues.apache.org/jira/browse/LUCENE-7976
> > >>
> > >> Deleted documents probably aren't really related to what we've been
> > >> discussing.  That shouldn't really be strongly affected by commit
> > settings.
> > >>
> > >> -----
> > >>
> > >> A 3 second autoSoftCommit is VERY aggressive.   If your soft commits
> are
> > >> taking longer than 3 seconds to complete, which is often what happens,
> > then
> > >> that will lead to problems.  I wouldn't expect it to cause the kinds
> of
> > >> problems you describe, though.  It would manifest as Solr working too
> > hard,
> > >> logging warnings or errors, and changes taking too long to show up.
> > >>
> > >> Assuming that the config for autoSoftCommit doesn't have the typo that
> > >> Erick mentioned.
> > >>
> > >> ----
> > >>
> > >> I have never used CDCR, so I know very little about it.  But I have
> seen
> > >> reports on this mailing list saying that transaction logs never get
> > deleted
> > >> when CDCR is configured.
> > >>
> > >> Below is a link to a mailing list discussion related to CDCR not
> > deleting
> > >> transaction logs.  Looks like for it to work right a buffer needs to
> be
> > >> disabled, and there may also be problems caused by not having a
> complete
> > >> zkHost string in the CDCR config:
> > >>
> > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-
> > >> the-transaction-log-files-td4345062.html
> > >>
> > >> Erick also mentioned this.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged
> or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents
> to
> > > any other person. If you have received this transmission in error,
> please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in
> this
> > > message which may arise as a result of E-Mail-transmission or for
> damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Re: solrcloud Auto-commit doesn't seem reliable

Posted by Webster Homer <we...@sial.com>.
It's been a while since I had time to look further into this. I'll have to
go back through logs, which I need to get retrieved by an admin.

On Fri, Mar 23, 2018 at 8:45 AM, Amrit Sarkar <sa...@gmail.com>
wrote:

> Elaino,
>
> When you say commits not working, the solr logs not printing "commit"
> messages? or documents are not appearing when we search.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario <et...@gmail.com> wrote:
>
> > I'm just catching up on reading solr emails, so forgive me for being late
> > to this dance....
> >
> > I've just gone through a project to enable CDCR on our Solr, and I also
> > experienced a small period of time where the commits on the source server
> > just seemed to stop.  This was during a period of intense experimentation
> > where I was mucking around with configurations, turning CDCR on/off, etc.
> > At some point the commits stopped occurring, and it drove me nuts for a
> > couple of days - tried everything - restarting Solr, reloading, turned
> > buffering on, turned buffering off, etc.  I finally threw up my hands and
> > rebooted the server out of desperation (it was a physical Linux box).
> > Commits worked fine after that.  I don't know what caused the commits to
> > stop, and why re-booting (and not just restarting Solr) caused them to
> work
> > fine.
> >
> > Wondering if you ever found a solution to your situation?
> >
> >
> >
> > On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer <we...@sial.com>
> > wrote:
> >
> > > I meant to get back to this sooner.
> > >
> > > When I say I issued a commit I do issue it as
> > collection/update?commit=true
> > >
> > > The soft commit interval is set to 3000, but I don't have a problem
> with
> > > soft commits ( I think). I was responding
> > >
> > > I am concerned that some hard commits don't seem to happen, but I think
> > > many commits do occur. I'd like suggestions on how to diagnose this,
> and
> > > perhaps an idea of where to look. Typically I believe that issues like
> > this
> > > are from our configuration.
> > >
> > > Our indexing job is pretty simple, we send blocks of JSON to
> > > <collection>/update/json. We have either re-index the whole collection,
> > or
> > > just apply updates. Typically we reindex the data once a week and
> delete
> > > any records that are older than the last full index. This does lead to
> a
> > > fair number of deleted records in the index especially if commits fail.
> > > Most of our collections are not large between 2 and 3 million records.
> > >
> > > The collections are hosted in google cloud
> > >
> > > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > bq: But if 3 seconds is aggressive what would be a  good value for
> soft
> > > > commit?
> > > >
> > > > The usual answer is "as long as you can stand". All top-level caches
> > are
> > > > invalidated, autowarming is done etc. on each soft commit. That can
> be
> > a
> > > > lot of
> > > > work and if your users are comfortable with docs not showing up for,
> > > > say, 10 minutes
> > > > then use 10 minutes. As always "it depends" here, the point is not to
> > > > do unnecessary
> > > > work if possible.
> > > >
> > > > bq: If a commit doesn't happen how would there ever be an index merge
> > > > that would remove the deleted documents.
> > > >
> > > > Right, it wouldn't. It's a little more subtle than that though.
> > > > Segments on various
> > > > replicas will contain different docs, thus the term/doc statistics
> can
> > be
> > > > a bit
> > > > different between multiple replicas. None of the stats will change
> > > > until the commit
> > > > though. You might try turning no distributed doc/term stats though.
> > > >
> > > > Your comments about PULL or TLOG replicas are well taken. However,
> even
> > > > those
> > > > won't be absolutely in sync since they'll replicate from the master
> at
> > > > slightly
> > > > different times and _could_ get slightly different segments _if_
> > > > there's indexing
> > > > going on. But let's say you stop indexing. After the next poll
> > > > interval all the replicas
> > > > will have identical characteristics and will score the docs the same.
> > > >
> > > > I don't have any signifiant wisdom to offer here, except this is
> really
> > > the
> > > > first time I've heard of this behavior. About all I can imagine is
> > > > that _somehow_
> > > > the soft commit interval is -1. When you say you "issue a commit" I'm
> > > > assuming
> > > > it's via ....collection/update?commit=true or some such which
> issues a
> > > > hard
> > > > commit with openSearcher=true. And it's on a _collection_ basis,
> right?
> > > >
> > > > Sorry I can't be more help
> > > > Erick
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <
> > webster.homer@sial.com>
> > > > wrote:
> > > > > Erick, I am aware of the CDCR buffering problem causing tlog
> > retention,
> > > > we
> > > > > always turn buffering off in our cdcr configurations.
> > > > >
> > > > > My post was precipitated by seeing that we had uncommitted data in
> > > > > collections > 24 hours after it was loaded. The collections I was
> > > looking
> > > > > at are in our development environment, where we do not use CDCR.
> > > However
> > > > > I'm pretty sure that I've seen situations in production where
> commits
> > > > were
> > > > > also long overdue.
> > > > >
> > > > > the "autoSoftcommit" was a typo. The soft commit logic seems to be
> > > fine,
> > > > I
> > > > > don't see an issue with data visibility. But if 3 seconds is
> > aggressive
> > > > > what would be a  good value for soft commit? We have a couple of
> > > > > collections that are updated every minute although most of them are
> > > > updated
> > > > > much less frequently.
> > > > >
> > > > > My reason for raising this commit issue is that we see problems
> with
> > > the
> > > > > relevancy of solrcloud searches, and the NRT replica type.
> Sometimes
> > > the
> > > > > results flip where the best hit varies by what replica serviced the
> > > > search.
> > > > > This is hard to explain to management. Doing an optimized does
> > address
> > > > the
> > > > > problem for a while. I try to avoid optimizing for the reasons you
> > and
> > > > Sean
> > > > > list. If a commit doesn't happen how would there ever be an index
> > merge
> > > > > that would remove the deleted documents.
> > > > >
> > > > > The problem with deletes and relevancy don't seem to occur when we
> > use
> > > > TLOG
> > > > > replicas, probably because they don't do their own indexing but get
> > > > copies
> > > > > from their leader. We are testing them now eventually we may
> abandon
> > > the
> > > > > use of NRT replicas for most of our collections.
> > > > >
> > > > > I am quite concerned about this commit issue. What kinds of things
> > > would
> > > > > influence whether a commit occurs? One commonality for our systems
> is
> > > > that
> > > > > they are hosted in a Google cloud. We have a number of collections
> > that
> > > > > share configurations, but others that do not. I think commits do
> > > happen,
> > > > > but I don't trust that autoCommit is reliable. What can we do to
> make
> > > it
> > > > > reliable?
> > > > >
> > > > > Most of our collections are reindexed weekly with partial updates
> > > applied
> > > > > daily, that at least is what happens in production, our development
> > > > clouds
> > > > > are not as regular.
> > > > >
> > > > > Our solr startup script sets the following values:
> > > > > -Dsolr.autoCommit.maxDocs=35000
> > > > > -Dsolr.autoCommit.maxTime=60000
> > > > > -Dsolr.autoSoftCommit.maxTime=3000
> > > > >
> > > > > I don't think we reference  solr.autoCommit.maxDocs in our
> > > solrconfig.xml
> > > > > files.
> > > > >
> > > > > here are our settings for autoCommit and autoSoftCommit
> > > > >
> > > > > We had a lot of issues with missing commits when we didn't set
> > > > > solr.autoCommit.maxTime
> > > > >      <autoCommit>
> > > > >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> > > > >        <openSearcher>false</openSearcher>
> > > > >     </autoCommit>
> > > > >
> > > > >      <autoSoftCommit>
> > > > >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> > > > >      </autoSoftCommit>
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey <ap...@elyograg.org>
> > > > wrote:
> > > > >
> > > > >> On 2/9/2018 9:29 AM, Webster Homer wrote:
> > > > >>
> > > > >>> A little more background. Our production Solrclouds are populated
> > via
> > > > >>> CDCR,
> > > > >>> CDCR does not replicate commits, Commits to the target clouds
> > happen
> > > > via
> > > > >>> autoCommit settings
> > > > >>>
> > > > >>> We see relvancy scores get inconsistent when there are too many
> > > deletes
> > > > >>> which seems to happen when hard commits don't happen.
> > > > >>>
> > > > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer <
> > > webster.homer@sial.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the
> > > visibility
> > > > of
> > > > >>>> the records that is my primary concern. I am concerned about is
> > the
> > > > >>>> accumulation of uncommitted tlog files and the larger number of
> > > > deleted
> > > > >>>> documents.
> > > > >>>>
> > > > >>>
> > > > >> For the deleted documents:  Have you ever done an optimize on the
> > > > >> collection?  If so, you're going to need to re-do the optimize
> > > > regularly to
> > > > >> keep deleted documents from growing out of control.  See this
> issue
> > > for
> > > > a
> > > > >> very technical discussion about it:
> > > > >>
> > > > >> https://issues.apache.org/jira/browse/LUCENE-7976
> > > > >>
> > > > >> Deleted documents probably aren't really related to what we've
> been
> > > > >> discussing.  That shouldn't really be strongly affected by commit
> > > > settings.
> > > > >>
> > > > >> -----
> > > > >>
> > > > >> A 3 second autoSoftCommit is VERY aggressive.   If your soft
> commits
> > > are
> > > > >> taking longer than 3 seconds to complete, which is often what
> > happens,
> > > > then
> > > > >> that will lead to problems.  I wouldn't expect it to cause the
> kinds
> > > of
> > > > >> problems you describe, though.  It would manifest as Solr working
> > too
> > > > hard,
> > > > >> logging warnings or errors, and changes taking too long to show
> up.
> > > > >>
> > > > >> Assuming that the config for autoSoftCommit doesn't have the typo
> > that
> > > > >> Erick mentioned.
> > > > >>
> > > > >> ----
> > > > >>
> > > > >> I have never used CDCR, so I know very little about it.  But I
> have
> > > seen
> > > > >> reports on this mailing list saying that transaction logs never
> get
> > > > deleted
> > > > >> when CDCR is configured.
> > > > >>
> > > > >> Below is a link to a mailing list discussion related to CDCR not
> > > > deleting
> > > > >> transaction logs.  Looks like for it to work right a buffer needs
> to
> > > be
> > > > >> disabled, and there may also be problems caused by not having a
> > > complete
> > > > >> zkHost string in the CDCR config:
> > > > >>
> > > > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-
> > > > >> the-transaction-log-files-td4345062.html
> > > > >>
> > > > >> Erick also mentioned this.
> > > > >>
> > > > >> Thanks,
> > > > >> Shawn
> > > > >>
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > This message and any attachment are confidential and may be
> > privileged
> > > or
> > > > > otherwise protected from disclosure. If you are not the intended
> > > > recipient,
> > > > > you must not copy this message or attachment or disclose the
> contents
> > > to
> > > > > any other person. If you have received this transmission in error,
> > > please
> > > > > notify the sender immediately and delete the message and any
> > attachment
> > > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > > subsidiaries do not accept liability for any omissions or errors in
> > > this
> > > > > message which may arise as a result of E-Mail-transmission or for
> > > damages
> > > > > resulting from any unauthorized changes of the content of this
> > message
> > > > and
> > > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of
> its
> > > > > subsidiaries do not guarantee that this message is free of viruses
> > and
> > > > does
> > > > > not accept liability for any damages caused by any virus
> transmitted
> > > > > therewith.
> > > > >
> > > > > Click http://www.emdgroup.com/disclaimer to access the German,
> > French,
> > > > > Spanish and Portuguese versions of this disclaimer.
> > > >
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged
> or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents
> to
> > > any other person. If you have received this transmission in error,
> please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in
> this
> > > message which may arise as a result of E-Mail-transmission or for
> damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> > >
> >
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: solrcloud Auto-commit doesn't seem reliable

Posted by Amrit Sarkar <sa...@gmail.com>.
Elaino,

When you say commits not working, the solr logs not printing "commit"
messages? or documents are not appearing when we search.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Thu, Mar 22, 2018 at 4:05 AM, Elaine Cario <et...@gmail.com> wrote:

> I'm just catching up on reading solr emails, so forgive me for being late
> to this dance....
>
> I've just gone through a project to enable CDCR on our Solr, and I also
> experienced a small period of time where the commits on the source server
> just seemed to stop.  This was during a period of intense experimentation
> where I was mucking around with configurations, turning CDCR on/off, etc.
> At some point the commits stopped occurring, and it drove me nuts for a
> couple of days - tried everything - restarting Solr, reloading, turned
> buffering on, turned buffering off, etc.  I finally threw up my hands and
> rebooted the server out of desperation (it was a physical Linux box).
> Commits worked fine after that.  I don't know what caused the commits to
> stop, and why re-booting (and not just restarting Solr) caused them to work
> fine.
>
> Wondering if you ever found a solution to your situation?
>
>
>
> On Fri, Feb 16, 2018 at 2:44 PM, Webster Homer <we...@sial.com>
> wrote:
>
> > I meant to get back to this sooner.
> >
> > When I say I issued a commit I do issue it as
> collection/update?commit=true
> >
> > The soft commit interval is set to 3000, but I don't have a problem with
> > soft commits ( I think). I was responding
> >
> > I am concerned that some hard commits don't seem to happen, but I think
> > many commits do occur. I'd like suggestions on how to diagnose this, and
> > perhaps an idea of where to look. Typically I believe that issues like
> this
> > are from our configuration.
> >
> > Our indexing job is pretty simple, we send blocks of JSON to
> > <collection>/update/json. We have either re-index the whole collection,
> or
> > just apply updates. Typically we reindex the data once a week and delete
> > any records that are older than the last full index. This does lead to a
> > fair number of deleted records in the index especially if commits fail.
> > Most of our collections are not large between 2 and 3 million records.
> >
> > The collections are hosted in google cloud
> >
> > On Mon, Feb 12, 2018 at 5:00 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > bq: But if 3 seconds is aggressive what would be a  good value for soft
> > > commit?
> > >
> > > The usual answer is "as long as you can stand". All top-level caches
> are
> > > invalidated, autowarming is done etc. on each soft commit. That can be
> a
> > > lot of
> > > work and if your users are comfortable with docs not showing up for,
> > > say, 10 minutes
> > > then use 10 minutes. As always "it depends" here, the point is not to
> > > do unnecessary
> > > work if possible.
> > >
> > > bq: If a commit doesn't happen how would there ever be an index merge
> > > that would remove the deleted documents.
> > >
> > > Right, it wouldn't. It's a little more subtle than that though.
> > > Segments on various
> > > replicas will contain different docs, thus the term/doc statistics can
> be
> > > a bit
> > > different between multiple replicas. None of the stats will change
> > > until the commit
> > > though. You might try turning no distributed doc/term stats though.
> > >
> > > Your comments about PULL or TLOG replicas are well taken. However, even
> > > those
> > > won't be absolutely in sync since they'll replicate from the master at
> > > slightly
> > > different times and _could_ get slightly different segments _if_
> > > there's indexing
> > > going on. But let's say you stop indexing. After the next poll
> > > interval all the replicas
> > > will have identical characteristics and will score the docs the same.
> > >
> > > I don't have any signifiant wisdom to offer here, except this is really
> > the
> > > first time I've heard of this behavior. About all I can imagine is
> > > that _somehow_
> > > the soft commit interval is -1. When you say you "issue a commit" I'm
> > > assuming
> > > it's via ....collection/update?commit=true or some such which issues a
> > > hard
> > > commit with openSearcher=true. And it's on a _collection_ basis, right?
> > >
> > > Sorry I can't be more help
> > > Erick
> > >
> > >
> > >
> > >
> > > On Mon, Feb 12, 2018 at 10:44 AM, Webster Homer <
> webster.homer@sial.com>
> > > wrote:
> > > > Erick, I am aware of the CDCR buffering problem causing tlog
> retention,
> > > we
> > > > always turn buffering off in our cdcr configurations.
> > > >
> > > > My post was precipitated by seeing that we had uncommitted data in
> > > > collections > 24 hours after it was loaded. The collections I was
> > looking
> > > > at are in our development environment, where we do not use CDCR.
> > However
> > > > I'm pretty sure that I've seen situations in production where commits
> > > were
> > > > also long overdue.
> > > >
> > > > the "autoSoftcommit" was a typo. The soft commit logic seems to be
> > fine,
> > > I
> > > > don't see an issue with data visibility. But if 3 seconds is
> aggressive
> > > > what would be a  good value for soft commit? We have a couple of
> > > > collections that are updated every minute although most of them are
> > > updated
> > > > much less frequently.
> > > >
> > > > My reason for raising this commit issue is that we see problems with
> > the
> > > > relevancy of solrcloud searches, and the NRT replica type. Sometimes
> > the
> > > > results flip where the best hit varies by what replica serviced the
> > > search.
> > > > This is hard to explain to management. Doing an optimized does
> address
> > > the
> > > > problem for a while. I try to avoid optimizing for the reasons you
> and
> > > Sean
> > > > list. If a commit doesn't happen how would there ever be an index
> merge
> > > > that would remove the deleted documents.
> > > >
> > > > The problem with deletes and relevancy don't seem to occur when we
> use
> > > TLOG
> > > > replicas, probably because they don't do their own indexing but get
> > > copies
> > > > from their leader. We are testing them now eventually we may abandon
> > the
> > > > use of NRT replicas for most of our collections.
> > > >
> > > > I am quite concerned about this commit issue. What kinds of things
> > would
> > > > influence whether a commit occurs? One commonality for our systems is
> > > that
> > > > they are hosted in a Google cloud. We have a number of collections
> that
> > > > share configurations, but others that do not. I think commits do
> > happen,
> > > > but I don't trust that autoCommit is reliable. What can we do to make
> > it
> > > > reliable?
> > > >
> > > > Most of our collections are reindexed weekly with partial updates
> > applied
> > > > daily, that at least is what happens in production, our development
> > > clouds
> > > > are not as regular.
> > > >
> > > > Our solr startup script sets the following values:
> > > > -Dsolr.autoCommit.maxDocs=35000
> > > > -Dsolr.autoCommit.maxTime=60000
> > > > -Dsolr.autoSoftCommit.maxTime=3000
> > > >
> > > > I don't think we reference  solr.autoCommit.maxDocs in our
> > solrconfig.xml
> > > > files.
> > > >
> > > > here are our settings for autoCommit and autoSoftCommit
> > > >
> > > > We had a lot of issues with missing commits when we didn't set
> > > > solr.autoCommit.maxTime
> > > >      <autoCommit>
> > > >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> > > >        <openSearcher>false</openSearcher>
> > > >     </autoCommit>
> > > >
> > > >      <autoSoftCommit>
> > > >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> > > >      </autoSoftCommit>
> > > >
> > > >
> > > >
> > > > On Fri, Feb 9, 2018 at 3:49 PM, Shawn Heisey <ap...@elyograg.org>
> > > wrote:
> > > >
> > > >> On 2/9/2018 9:29 AM, Webster Homer wrote:
> > > >>
> > > >>> A little more background. Our production Solrclouds are populated
> via
> > > >>> CDCR,
> > > >>> CDCR does not replicate commits, Commits to the target clouds
> happen
> > > via
> > > >>> autoCommit settings
> > > >>>
> > > >>> We see relvancy scores get inconsistent when there are too many
> > deletes
> > > >>> which seems to happen when hard commits don't happen.
> > > >>>
> > > >>> On Fri, Feb 9, 2018 at 10:25 AM, Webster Homer <
> > webster.homer@sial.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> I we do have autoSoftcommit set to 3 seconds. It is NOT the
> > visibility
> > > of
> > > >>>> the records that is my primary concern. I am concerned about is
> the
> > > >>>> accumulation of uncommitted tlog files and the larger number of
> > > deleted
> > > >>>> documents.
> > > >>>>
> > > >>>
> > > >> For the deleted documents:  Have you ever done an optimize on the
> > > >> collection?  If so, you're going to need to re-do the optimize
> > > regularly to
> > > >> keep deleted documents from growing out of control.  See this issue
> > for
> > > a
> > > >> very technical discussion about it:
> > > >>
> > > >> https://issues.apache.org/jira/browse/LUCENE-7976
> > > >>
> > > >> Deleted documents probably aren't really related to what we've been
> > > >> discussing.  That shouldn't really be strongly affected by commit
> > > settings.
> > > >>
> > > >> -----
> > > >>
> > > >> A 3 second autoSoftCommit is VERY aggressive.   If your soft commits
> > are
> > > >> taking longer than 3 seconds to complete, which is often what
> happens,
> > > then
> > > >> that will lead to problems.  I wouldn't expect it to cause the kinds
> > of
> > > >> problems you describe, though.  It would manifest as Solr working
> too
> > > hard,
> > > >> logging warnings or errors, and changes taking too long to show up.
> > > >>
> > > >> Assuming that the config for autoSoftCommit doesn't have the typo
> that
> > > >> Erick mentioned.
> > > >>
> > > >> ----
> > > >>
> > > >> I have never used CDCR, so I know very little about it.  But I have
> > seen
> > > >> reports on this mailing list saying that transaction logs never get
> > > deleted
> > > >> when CDCR is configured.
> > > >>
> > > >> Below is a link to a mailing list discussion related to CDCR not
> > > deleting
> > > >> transaction logs.  Looks like for it to work right a buffer needs to
> > be
> > > >> disabled, and there may also be problems caused by not having a
> > complete
> > > >> zkHost string in the CDCR config:
> > > >>
> > > >> http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-
> > > >> the-transaction-log-files-td4345062.html
> > > >>
> > > >> Erick also mentioned this.
> > > >>
> > > >> Thanks,
> > > >> Shawn
> > > >>
> > > >
> > > > --
> > > >
> > > >
> > > > This message and any attachment are confidential and may be
> privileged
> > or
> > > > otherwise protected from disclosure. If you are not the intended
> > > recipient,
> > > > you must not copy this message or attachment or disclose the contents
> > to
> > > > any other person. If you have received this transmission in error,
> > please
> > > > notify the sender immediately and delete the message and any
> attachment
> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not accept liability for any omissions or errors in
> > this
> > > > message which may arise as a result of E-Mail-transmission or for
> > damages
> > > > resulting from any unauthorized changes of the content of this
> message
> > > and
> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not guarantee that this message is free of viruses
> and
> > > does
> > > > not accept liability for any damages caused by any virus transmitted
> > > > therewith.
> > > >
> > > > Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> > > > Spanish and Portuguese versions of this disclaimer.
> > >
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>