You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Satya Marivada <sa...@gmail.com> on 2018/05/03 18:55:33 UTC

inconsistent results

Hi there,

We have a solr (6.3.0) index which is being re-indexed every night, it
takes about 6-7 hours for the indexing to complete. During the time of
re-indexing, the index becomes flaky and would serve inconsistent count of
documents 70,000 at times and 80,000 at times. After the indexing is
completed, it serves the consistent and right number of documents that it
has indexed from the database. Any suggestions on this.

Also solr writes to the same location as current index during re-indexing.
Could this be the cause of concern?

Thanks,
Satya

Re: inconsistent results

Posted by Satya Marivada <sa...@gmail.com>.
Yes, we are doing clean and full import. Is it not supposed to serve
old(existing) index till the new index is built and then do a cleanup,
replace old index after new index is built?

Would a full import without clean not give this problem?

Thanks Erick, this would be useful.

On Thu, May 3, 2018, 4:28 PM Erick Erickson <er...@gmail.com> wrote:

> The short for is that different replicas in a shard have different
> commit point if you go by wall-clock time. So during heavy indexing,
> you can happen to catch the different counts. That really shouldn't
> happen, though, unless you're clearing the index first on the
> assumption that you're replacing the same docs each time....
>
> One solution people use is to index to a "dark" collection, then use
> collection aliasing to atomically switch when the job is done.
>
> Best,
> Erick
>
>
> On Thu, May 3, 2018 at 11:55 AM, Satya Marivada
> <sa...@gmail.com> wrote:
> > Hi there,
> >
> > We have a solr (6.3.0) index which is being re-indexed every night, it
> > takes about 6-7 hours for the indexing to complete. During the time of
> > re-indexing, the index becomes flaky and would serve inconsistent count
> of
> > documents 70,000 at times and 80,000 at times. After the indexing is
> > completed, it serves the consistent and right number of documents that it
> > has indexed from the database. Any suggestions on this.
> >
> > Also solr writes to the same location as current index during
> re-indexing.
> > Could this be the cause of concern?
> >
> > Thanks,
> > Satya
>

Re: inconsistent results

Posted by Erick Erickson <er...@gmail.com>.
The short for is that different replicas in a shard have different
commit point if you go by wall-clock time. So during heavy indexing,
you can happen to catch the different counts. That really shouldn't
happen, though, unless you're clearing the index first on the
assumption that you're replacing the same docs each time....

One solution people use is to index to a "dark" collection, then use
collection aliasing to atomically switch when the job is done.

Best,
Erick


On Thu, May 3, 2018 at 11:55 AM, Satya Marivada
<sa...@gmail.com> wrote:
> Hi there,
>
> We have a solr (6.3.0) index which is being re-indexed every night, it
> takes about 6-7 hours for the indexing to complete. During the time of
> re-indexing, the index becomes flaky and would serve inconsistent count of
> documents 70,000 at times and 80,000 at times. After the indexing is
> completed, it serves the consistent and right number of documents that it
> has indexed from the database. Any suggestions on this.
>
> Also solr writes to the same location as current index during re-indexing.
> Could this be the cause of concern?
>
> Thanks,
> Satya

Re: inconsistent results

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/3/2018 12:55 PM, Satya Marivada wrote:
> We have a solr (6.3.0) index which is being re-indexed every night, it
> takes about 6-7 hours for the indexing to complete. During the time of
> re-indexing, the index becomes flaky and would serve inconsistent count of
> documents 70,000 at times and 80,000 at times. After the indexing is
> completed, it serves the consistent and right number of documents that it
> has indexed from the database. Any suggestions on this.

Initial guess is that there are commits being fired before the whole
indexing process is complete.

If you're running in cloud mode, there could be other things going on.

> Also solr writes to the same location as current index during re-indexing.
> Could this be the cause of concern?

When you use an existing index as the write location for a re-index, you
must be very careful to ensure that you do not ever send any commit
requests before the entire indexing process is complete.  The autoCommit
config in solrconfig.xml must have openSearcher set to false, and
autoSoftCommit must not be active.  That way, all queries sent before
the process completes will be handled by the index that existed before
the indexing process started.  A commit when the process is done will
send new queries to the new state of the index.

An alternate idea would be to index the replacement index into a
different core/collection, and then swap the indexes.  In SolrCloud
mode, the swap would be accomplished using the Collection Alias feature.

Thanks,
Shawn