You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Chris Troullis <cp...@gmail.com> on 2017/08/01 20:50:19 UTC

Inconsistency in results between replicas using CloudSolrClient

Hi,

I think I know the answer to this question, but just wanted to verify/see
what other people do to address this concern.

I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
and 2 replicas (1 replica per node). The nature of my use case requires
frequent updates to Solr, and documents are being added constantly
throughout the day. I am using CloudSolrClient via SolrJ to query my
collection and load balance across my 2 replicas.

Here's my question:

As I understand it, because of the nature of Solr Cloud (eventual
consistency), and the fact that the soft commit timings on the 2 replicas
will not necessarily be in sync, would it not be possible to run into a
scenario where, say a document gets indexed on replica 1 right before a
soft commit, but indexed on replica 2 right after a soft commit? In this
scenario, using the load balanced CloudSolrClient, wouldn't it be possible
for a user to do a search, see the newly added document because they got
sent to replica 1, and then search again, and the newly added document
would disappear from their results since they got sent to replica 2 and the
soft commit hasn't happened yet?

If so, how do people typically handle this scenario in NRT search cases? It
seems like a poor user experience if things keep disappearing and
reappearing from their search results randomly. Currently the only thought
I have to prevent this is to write (or extend) my own solr client to stick
a user's session to a specific replica (unless it goes down), but still
load balance users between the replicas. But of course then I have to
manage all of the things CloudSolrClient manages manually re: cluster
state, etc.

Can anyone confirm/deny my understanding of how this works/offer any
suggestions to eliminate the scenario in question from occurring?

Thanks,

Chris

Re: Inconsistency in results between replicas using CloudSolrClient

Posted by Erick Erickson <er...@gmail.com>.

re: automated tests. In the Solr JUnit tests you'll see this pattern

- create collection
- add a bunch of docs
- do a commit
- now test consistency

Which, of course, doesn't particularly help if you're indexing after the commit.

For things like soft commit tests special care is taken to allow the
autocommit interval to expire.

bq: but not sure I like the performance implications

Totally agree. Hmmmm, maybe commit every X amount of time from the
client?. Committing from the client is, BTW, something I loathe
but.... I suppose one could write a cron job that issued a hard commit
or a soft commit to the collection rather than relying on autocommit
settings, then set those very high in solrconfig for anything that
opens a searcher. If you do anything like that I'd still do a hard
commit for HA/DR reasons from solrconfig.xml but with
openSearcher=false.

Or, just wait until anyone really notices. All I can say is that it
hasn't been a big enough problem to garner the effort to push the JIRA
I mentioned over the finish line.

Best,
Erick

On Tue, Aug 1, 2017 at 3:18 PM, Chris Troullis <cp...@gmail.com> wrote:
> Thanks for the reply Erick, I feared that would be the case. Interesting
> idea with using the fq but not sure I like the performance implications. I
> will see how big of a deal it will be in practice, I was just thinking
> about this as a hypothetical scenario today, and as you said, we have a lot
> of automated tests so I anticipate this likely causing issues. I'll give it
> some more thought and see if I can come up with any other workarounds.
>
> -Chris
>
> On Tue, Aug 1, 2017 at 5:38 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> You're understanding is correct.
>>
>> As for how people cope? Mostly they ignore it. The actual number of
>> times people notice this is usually quite small, mostly it surfaces
>> when automated test suites are run.
>>
>> If you must lock this up, and you can stand the latency you could add
>> a timestamp for each document and auto-add an FQ clause like:
>> fq=timestamp:[* TO NOW-soft_commit_interval_plus_some_windage]
>>
>> Note, though, that this not an fq clause that can be re-used, see:
>> https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ so
>> either it'd be something like:
>> fq=timestamp:[* TO NOW/MINUTE-soft_commit_interval_plus_some_windage]
>> or
>> fq=timestamp:{!cache=false}[* TO NOW-soft_commit_interval_plus_
>> some_windage]
>>
>> and would inevitably make the latency between when something was
>> indexed and available for search longer.
>>
>> You can also reduce your soft commit interval to something short, but
>> that has other problems.
>>
>> see: SOLR-6606, but it looks like other priorities have gotten in the
>> way of it being committed.
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 1, 2017 at 1:50 PM, Chris Troullis <cp...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I think I know the answer to this question, but just wanted to verify/see
>> > what other people do to address this concern.
>> >
>> > I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
>> > and 2 replicas (1 replica per node). The nature of my use case requires
>> > frequent updates to Solr, and documents are being added constantly
>> > throughout the day. I am using CloudSolrClient via SolrJ to query my
>> > collection and load balance across my 2 replicas.
>> >
>> > Here's my question:
>> >
>> > As I understand it, because of the nature of Solr Cloud (eventual
>> > consistency), and the fact that the soft commit timings on the 2 replicas
>> > will not necessarily be in sync, would it not be possible to run into a
>> > scenario where, say a document gets indexed on replica 1 right before a
>> > soft commit, but indexed on replica 2 right after a soft commit? In this
>> > scenario, using the load balanced CloudSolrClient, wouldn't it be
>> possible
>> > for a user to do a search, see the newly added document because they got
>> > sent to replica 1, and then search again, and the newly added document
>> > would disappear from their results since they got sent to replica 2 and
>> the
>> > soft commit hasn't happened yet?
>> >
>> > If so, how do people typically handle this scenario in NRT search cases?
>> It
>> > seems like a poor user experience if things keep disappearing and
>> > reappearing from their search results randomly. Currently the only
>> thought
>> > I have to prevent this is to write (or extend) my own solr client to
>> stick
>> > a user's session to a specific replica (unless it goes down), but still
>> > load balance users between the replicas. But of course then I have to
>> > manage all of the things CloudSolrClient manages manually re: cluster
>> > state, etc.
>> >
>> > Can anyone confirm/deny my understanding of how this works/offer any
>> > suggestions to eliminate the scenario in question from occurring?
>> >
>> > Thanks,
>> >
>> > Chris
>>

Re: Inconsistency in results between replicas using CloudSolrClient

Posted by Chris Troullis <cp...@gmail.com>.

Thanks for the reply Erick, I feared that would be the case. Interesting
idea with using the fq but not sure I like the performance implications. I
will see how big of a deal it will be in practice, I was just thinking
about this as a hypothetical scenario today, and as you said, we have a lot
of automated tests so I anticipate this likely causing issues. I'll give it
some more thought and see if I can come up with any other workarounds.

-Chris

On Tue, Aug 1, 2017 at 5:38 PM, Erick Erickson <er...@gmail.com>
wrote:

> You're understanding is correct.
>
> As for how people cope? Mostly they ignore it. The actual number of
> times people notice this is usually quite small, mostly it surfaces
> when automated test suites are run.
>
> If you must lock this up, and you can stand the latency you could add
> a timestamp for each document and auto-add an FQ clause like:
> fq=timestamp:[* TO NOW-soft_commit_interval_plus_some_windage]
>
> Note, though, that this not an fq clause that can be re-used, see:
> https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ so
> either it'd be something like:
> fq=timestamp:[* TO NOW/MINUTE-soft_commit_interval_plus_some_windage]
> or
> fq=timestamp:{!cache=false}[* TO NOW-soft_commit_interval_plus_
> some_windage]
>
> and would inevitably make the latency between when something was
> indexed and available for search longer.
>
> You can also reduce your soft commit interval to something short, but
> that has other problems.
>
> see: SOLR-6606, but it looks like other priorities have gotten in the
> way of it being committed.
>
> Best,
> Erick
>
> On Tue, Aug 1, 2017 at 1:50 PM, Chris Troullis <cp...@gmail.com>
> wrote:
> > Hi,
> >
> > I think I know the answer to this question, but just wanted to verify/see
> > what other people do to address this concern.
> >
> > I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
> > and 2 replicas (1 replica per node). The nature of my use case requires
> > frequent updates to Solr, and documents are being added constantly
> > throughout the day. I am using CloudSolrClient via SolrJ to query my
> > collection and load balance across my 2 replicas.
> >
> > Here's my question:
> >
> > As I understand it, because of the nature of Solr Cloud (eventual
> > consistency), and the fact that the soft commit timings on the 2 replicas
> > will not necessarily be in sync, would it not be possible to run into a
> > scenario where, say a document gets indexed on replica 1 right before a
> > soft commit, but indexed on replica 2 right after a soft commit? In this
> > scenario, using the load balanced CloudSolrClient, wouldn't it be
> possible
> > for a user to do a search, see the newly added document because they got
> > sent to replica 1, and then search again, and the newly added document
> > would disappear from their results since they got sent to replica 2 and
> the
> > soft commit hasn't happened yet?
> >
> > If so, how do people typically handle this scenario in NRT search cases?
> It
> > seems like a poor user experience if things keep disappearing and
> > reappearing from their search results randomly. Currently the only
> thought
> > I have to prevent this is to write (or extend) my own solr client to
> stick
> > a user's session to a specific replica (unless it goes down), but still
> > load balance users between the replicas. But of course then I have to
> > manage all of the things CloudSolrClient manages manually re: cluster
> > state, etc.
> >
> > Can anyone confirm/deny my understanding of how this works/offer any
> > suggestions to eliminate the scenario in question from occurring?
> >
> > Thanks,
> >
> > Chris
>

Re: Inconsistency in results between replicas using CloudSolrClient

Posted by Erick Erickson <er...@gmail.com>.

You're understanding is correct.

As for how people cope? Mostly they ignore it. The actual number of
times people notice this is usually quite small, mostly it surfaces
when automated test suites are run.

If you must lock this up, and you can stand the latency you could add
a timestamp for each document and auto-add an FQ clause like:
fq=timestamp:[* TO NOW-soft_commit_interval_plus_some_windage]

Note, though, that this not an fq clause that can be re-used, see:
https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ so
either it'd be something like:
fq=timestamp:[* TO NOW/MINUTE-soft_commit_interval_plus_some_windage]
or
fq=timestamp:{!cache=false}[* TO NOW-soft_commit_interval_plus_some_windage]

and would inevitably make the latency between when something was
indexed and available for search longer.

You can also reduce your soft commit interval to something short, but
that has other problems.

see: SOLR-6606, but it looks like other priorities have gotten in the
way of it being committed.

Best,
Erick

On Tue, Aug 1, 2017 at 1:50 PM, Chris Troullis <cp...@gmail.com> wrote:
> Hi,
>
> I think I know the answer to this question, but just wanted to verify/see
> what other people do to address this concern.
>
> I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
> and 2 replicas (1 replica per node). The nature of my use case requires
> frequent updates to Solr, and documents are being added constantly
> throughout the day. I am using CloudSolrClient via SolrJ to query my
> collection and load balance across my 2 replicas.
>
> Here's my question:
>
> As I understand it, because of the nature of Solr Cloud (eventual
> consistency), and the fact that the soft commit timings on the 2 replicas
> will not necessarily be in sync, would it not be possible to run into a
> scenario where, say a document gets indexed on replica 1 right before a
> soft commit, but indexed on replica 2 right after a soft commit? In this
> scenario, using the load balanced CloudSolrClient, wouldn't it be possible
> for a user to do a search, see the newly added document because they got
> sent to replica 1, and then search again, and the newly added document
> would disappear from their results since they got sent to replica 2 and the
> soft commit hasn't happened yet?
>
> If so, how do people typically handle this scenario in NRT search cases? It
> seems like a poor user experience if things keep disappearing and
> reappearing from their search results randomly. Currently the only thought
> I have to prevent this is to write (or extend) my own solr client to stick
> a user's session to a specific replica (unless it goes down), but still
> load balance users between the replicas. But of course then I have to
> manage all of the things CloudSolrClient manages manually re: cluster
> state, etc.
>
> Can anyone confirm/deny my understanding of how this works/offer any
> suggestions to eliminate the scenario in question from occurring?
>
> Thanks,
>
> Chris