You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Nick Vladiceanu <vl...@gmail.com> on 2023/06/09 07:43:13 UTC

Index and tlog duplicated, searcher uses old index

Hi everyone,

we have experienced a very strange situation with our Solr 9.1.1 that is running on the official docker image on Kubernetes. 

Context:
Total: 3 collections, 8 nodes;
All replicas types: TLOG; directory factory: MMapDirectoryFactory;
2 collections have only one shard, they are not actively updated, very tiny (<20MB), legacy (collection_1 and collection_2), we can skip focusing on them;
1 collection has two shards, each shard has 4 replicas; collection_0;
each node has only one replica (core) of the collection_0 and one replica of the other two collections; total, a node would have max 3 cores hosted on it;
total size for collection_0 is ~6-7GB; since it’s split into two shards, each shard would have ~3GB index size.

We noticed that we get inconsistent results for the same query if run multiple times. Out of 4 requests, one of them was returning empty response when we were running “/select?q=id:12345&distrib=true”. 

Started checking each core and we noticed that the core on node-7 had "Last Modified: 9 days ago” (Solr UI -> selected the core -> Overview). On the right side, "Instance details" were showing that we are using “Index: /var/solr/data/collection_0_shard2_replica_t15/data/index.20230530170400660”. Something is wrong.

We "kubectl exec” and started to check the “/var/solr/data” folder. All the commands executed and results are posted here: https://justpaste.it/47jp4. 

What we saw there is that there were two indices and two transaction logs. The old index "index.20230530170400660” had last update on May 30, 17:04, the new index "index.20230530113934434” was constantly updated, however, the size of the new index was only growing, reaching the size of ~50GB. “index.properties” was pointing to the old index "index.20230530170400660”, whereas “replication.properties” to the new index "index.20230530113934434”.

solr@node-7:/var/solr/data/collection_0_shard2_replica_t15/data$ du -h index.20230530113934434  
56G index.20230530113934434     

solr@node-7:/var/solr/data/collection_0_shard2_replica_t15/data$ du -h index.20230530170400660  
3.1G index.20230530170400660

Every time a new searcher was opened, it was using the old, outdated index, hence, the reason why we were seeing the inconsistent results. We couldn’t find any errors in the logs of the node, only some warning related to the checksums of the segments not matching and retrying to pull the segments.

Could anyone please help with ideas on how to further debug to find the reason and how to better monitor/prevent this from happening in the future? What could be the cause of a new index being created, and why the aliases weren’t switched to use the new index? Are there any metrics that could tell us the the time since last replication from leader of the active index? 

Thanks in advance.

Best regards,
---
Nick Vladiceanu
vladiceanu.n@gmail.com 





Re: Index and tlog duplicated, searcher uses old index

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/9/23 08:33, Shawn Heisey wrote:
> We're back to the possibility of a general performance issue

For this, can you share the Solr GC logs?  A lot of information about 
performance can be gathered by analyzing those logs.

Email attachments generally do not work on the mailing list.  Better to 
put data on a file sharing website and provide a URL to access it.

There should not be any sensitive information in GC logs.

Thanks,
Shawn

Re: Index and tlog duplicated, searcher uses old index

Posted by Jan Høydahl <ja...@cominvent.com>.
There is also a new "maxSize" setting for autoCommit, that could perhaps be used in place of maxDocs to prevent the trans-log growing too big, see https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html#automatic-commits

I have seen the index.NNNNNNNN directories a few times before with classic replication, piling up and causing disk space issues. But never in solrCloud and never getting stale like this, so no idea what is happening here. It is IndexFetcher that creates these .NNN dirs as a temp directory while pulling a new index without overwriting the existing one. So my guess is that something has gone wrong during index fetching, so that the temp .NNN folder was never swapped in as the new "index" folder but left on disk. Watch out for exceptions or error logs. Worst case some exception is swallowed somewhere?

Jan

> 12. jun. 2023 kl. 10:47 skrev Nick Vladiceanu <vl...@gmail.com>:
> 
> the reason autoCommit.maxTime is set to 5 mins is because of openSearcher=true due to the TLOG + PULL replica types. Opening a new searcher too often is a costly operation, especially during high traffic. When we were using NRT, the autoCommit.maxTime was set to something like 60s and softCommit to 5 mins, hence migrated to TLOG + PULL, the only way to open a new searcher and to “instruct” the pull replicas to pull the segments periodically is via the autoCommit.maxTime (if that didn’t change, the pull replicas will pull every autoCommit.maxTime / 2). 
> 
> We see some warnings regarding the “checksum didn’t match” on some nodes:
> (example)
> File _lbu.cfe did not match. expected checksum is 1075403290 and actual is checksum 387703527. expected length is 542 and actual length is 542
> 
> 
> The GC activity for the past 7 days was uploaded from another node (from the same cluster): https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjMvMDYvMTIvc29scl9nYy50YXIuZ3otLTgtMzgtNDk=&channel=WEB 
> 
> The reload timeout is indeed weird, it never succeeds on some cores unless they get re-created. It would keep timing out on 180s no matter what. This is another issue since the Solr 9 upgrade that was reported https://lists.apache.org/thread/vxklzw5z1qm9wdo4536mc7tcto7ov33x. Since we upgraded from 9.0 to 9.1.1, the timeout would happen much less often and the cluster won’t be destabilized. We lower the inter-node communication timeouts and that helped to keep the cluster stable despite one node failing to reload.
> 
> We are planning to start upgrading to 9.2.1, hopefully that will solve the issue.
> 
> ---
> Nick Vladiceanu
> vladiceanu.n@gmail.com 
> 
> 
> 
> 
>> On 9. Jun 2023, at 16:33, Shawn Heisey <el...@elyograg.org> wrote:
>> 
>> On 6/9/23 03:05, Nick Vladiceanu wrote:
>>> autoCommit is enabled and it also has openSearcher set to true because in TLOG / PULL replicas there is no softCommit and therefore we need to open a new searcher during autoCommit.
>>> <autoCommit>
>>>    <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>>>    <maxDocs>${solr.autoCommit.maxDocs:1000}</maxDocs>
>>>    <openSearcher>true</openSearcher>
>>> </autoCommit>
>>> When tried to reload the collection, the node in question (node-7) timed out without any errors (general timeout, 180s).
>> 
>> You should probably lower the maxTime on your autoCommit.  Unless you have commits that regularly take longer than 30 seconds to complete ... and if that's the case, you might have a general performance issue and it might not be easy to solve.  I would suggest a value of 60000 for maxTime.
>> 
>> You should also remove maxDocs.  Time based autoCommit is a lot more predictable.  With a super low value of 1000 docs, the autoCommit may be firing VERY frequently, and that can be problematic.
>> 
>> With the relatively small size of your indexes, it seems very weird that a reload would take longer than 3 minutes.  I would expect it to take at most a few seconds.  I managed a sharded index (no SolrCloud) with much larger cores and reloading all the cores would only take a few seconds total.  We're back to the possibility of a general performance issue.
>> 
>> I am hoping that all the fixes in 9.2.1 will help.  I did a quick glance through CHANGES.txt and nothing jumped out at me as a definite candidate for your troubles, but sometimes multiple fixes work together to fix a larger issue.
>> 
>> I wish I had something more definite to tell you.  Are you seeing any warnings or errors in solr.log?
>> 
>> Thanks,
>> Shawn
> 


Re: Index and tlog duplicated, searcher uses old index

Posted by Nick Vladiceanu <vl...@gmail.com>.
the reason autoCommit.maxTime is set to 5 mins is because of openSearcher=true due to the TLOG + PULL replica types. Opening a new searcher too often is a costly operation, especially during high traffic. When we were using NRT, the autoCommit.maxTime was set to something like 60s and softCommit to 5 mins, hence migrated to TLOG + PULL, the only way to open a new searcher and to “instruct” the pull replicas to pull the segments periodically is via the autoCommit.maxTime (if that didn’t change, the pull replicas will pull every autoCommit.maxTime / 2). 

We see some warnings regarding the “checksum didn’t match” on some nodes:
(example)
File _lbu.cfe did not match. expected checksum is 1075403290 and actual is checksum 387703527. expected length is 542 and actual length is 542


The GC activity for the past 7 days was uploaded from another node (from the same cluster): https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjMvMDYvMTIvc29scl9nYy50YXIuZ3otLTgtMzgtNDk=&channel=WEB 

The reload timeout is indeed weird, it never succeeds on some cores unless they get re-created. It would keep timing out on 180s no matter what. This is another issue since the Solr 9 upgrade that was reported https://lists.apache.org/thread/vxklzw5z1qm9wdo4536mc7tcto7ov33x. Since we upgraded from 9.0 to 9.1.1, the timeout would happen much less often and the cluster won’t be destabilized. We lower the inter-node communication timeouts and that helped to keep the cluster stable despite one node failing to reload.

We are planning to start upgrading to 9.2.1, hopefully that will solve the issue.

---
Nick Vladiceanu
vladiceanu.n@gmail.com 




> On 9. Jun 2023, at 16:33, Shawn Heisey <el...@elyograg.org> wrote:
> 
> On 6/9/23 03:05, Nick Vladiceanu wrote:
>> autoCommit is enabled and it also has openSearcher set to true because in TLOG / PULL replicas there is no softCommit and therefore we need to open a new searcher during autoCommit.
>>  <autoCommit>
>>     <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>>     <maxDocs>${solr.autoCommit.maxDocs:1000}</maxDocs>
>>     <openSearcher>true</openSearcher>
>> </autoCommit>
>> When tried to reload the collection, the node in question (node-7) timed out without any errors (general timeout, 180s).
> 
> You should probably lower the maxTime on your autoCommit.  Unless you have commits that regularly take longer than 30 seconds to complete ... and if that's the case, you might have a general performance issue and it might not be easy to solve.  I would suggest a value of 60000 for maxTime.
> 
> You should also remove maxDocs.  Time based autoCommit is a lot more predictable.  With a super low value of 1000 docs, the autoCommit may be firing VERY frequently, and that can be problematic.
> 
> With the relatively small size of your indexes, it seems very weird that a reload would take longer than 3 minutes.  I would expect it to take at most a few seconds.  I managed a sharded index (no SolrCloud) with much larger cores and reloading all the cores would only take a few seconds total.  We're back to the possibility of a general performance issue.
> 
> I am hoping that all the fixes in 9.2.1 will help.  I did a quick glance through CHANGES.txt and nothing jumped out at me as a definite candidate for your troubles, but sometimes multiple fixes work together to fix a larger issue.
> 
> I wish I had something more definite to tell you.  Are you seeing any warnings or errors in solr.log?
> 
> Thanks,
> Shawn


Re: Index and tlog duplicated, searcher uses old index

Posted by Shawn Heisey <el...@elyograg.org>.
On 6/9/23 03:05, Nick Vladiceanu wrote:
> autoCommit is enabled and it also has openSearcher set to true because in TLOG / PULL replicas there is no softCommit and therefore we need to open a new searcher during autoCommit.
> 
>   <autoCommit>
>      <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>      <maxDocs>${solr.autoCommit.maxDocs:1000}</maxDocs>
>      <openSearcher>true</openSearcher>
> </autoCommit>
> 
> When tried to reload the collection, the node in question (node-7) timed out without any errors (general timeout, 180s).

You should probably lower the maxTime on your autoCommit.  Unless you 
have commits that regularly take longer than 30 seconds to complete ... 
and if that's the case, you might have a general performance issue and 
it might not be easy to solve.  I would suggest a value of 60000 for 
maxTime.

You should also remove maxDocs.  Time based autoCommit is a lot more 
predictable.  With a super low value of 1000 docs, the autoCommit may be 
firing VERY frequently, and that can be problematic.

With the relatively small size of your indexes, it seems very weird that 
a reload would take longer than 3 minutes.  I would expect it to take at 
most a few seconds.  I managed a sharded index (no SolrCloud) with much 
larger cores and reloading all the cores would only take a few seconds 
total.  We're back to the possibility of a general performance issue.

I am hoping that all the fixes in 9.2.1 will help.  I did a quick glance 
through CHANGES.txt and nothing jumped out at me as a definite candidate 
for your troubles, but sometimes multiple fixes work together to fix a 
larger issue.

I wish I had something more definite to tell you.  Are you seeing any 
warnings or errors in solr.log?

Thanks,
Shawn

Re: Index and tlog duplicated, searcher uses old index

Posted by Nick Vladiceanu <vl...@gmail.com>.
thanks Shawn.

autoCommit is enabled and it also has openSearcher set to true because in TLOG / PULL replicas there is no softCommit and therefore we need to open a new searcher during autoCommit.

 <autoCommit>          
    <maxTime>${solr.autoCommit.maxTime:300000}</maxTime> 
    <maxDocs>${solr.autoCommit.maxDocs:1000}</maxDocs>
    <openSearcher>true</openSearcher>
</autoCommit>

When tried to reload the collection, the node in question (node-7) timed out without any errors (general timeout, 180s). 

We have multiple clusters that run similar setup (difference is the # of nodes, docs and size of the nodes), none of them ended up in such a weird state. 

This is a bit worrying as in bigger clusters, without proper monitoring and alerting[*], one might end up serving outdated content.

We are planning to upgrade to 9.2.1 and actively monitor the state of the nodes..

[*] - which we still need to figure it out which metrics could tell us that the active index is lagging behind the leader; we got an idea though, basically, "sum(rate(solr_metrics_core_searcher_documents{namespace!=“"}[10m])) by (pod, collection)” which could give us at least some understanding of the index state on each node // careful: this will work only if you have continuous updates to the collections. Perhaps, anyone has better ideas on how to monitor the lag of active index?

Thanks,
---
Nick Vladiceanu
vladiceanu.n@gmail.com 




> On 9. Jun 2023, at 10:34, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 6/9/23 01:43, Nick Vladiceanu wrote:
>> We noticed that we get inconsistent results for the same query if run multiple times. Out of 4 requests, one of them was returning empty response when we were running “/select?q=id:12345&distrib=true”.
>> Started checking each core and we noticed that the core on node-7 had "Last Modified: 9 days ago” (Solr UI -> selected the core -> Overview). On the right side, "Instance details" were showing that we are using “Index: /var/solr/data/collection_0_shard2_replica_t15/data/index.20230530170400660”. Something is wrong.
> 
> I suspect that you may have turned off autoCommit.  If so, that's a bad idea.
> 
> The solrconfig.xml should always have autoCommit configured with a relatively short maxTime.  The configs Solr ships with have maxTime set to 15000 milliseconds.  In most cases, the autoCommit should have openSearcher set to false.  I personally increase the maxTime to 60000 milliseconds, so there is less overall system load, but the 15 second interval in the example configs works very well.  If it didn't, it wouldn't be in the example configs.
> 
> TLOG and PULL followers query the leader for changes on an interval that's half of the autoCommit maxTime setting.
> 
> Note that autoCommit serves a very different purpose than autoSoftCommit.  If you're going to disable one of them, it should be autoSoftCommit.
> 
> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> 
> If you haven't disabled autoCommit then I don't know what might be wrong.
> 
> You should upgrade to 9.2.1.  The list of bugs fixed between 9.1 and 9.2 is very extensive, and I have actually run into a number of them on 9.1.1.
> 
> I have never understood what makes Solr use an "index.NNNNNNNN" directory instead of just "index" or when it switches to a new directory.  I know it has something to do with replication, which is the Solr feature that SolrCloud uses to copy TLOG/PULL replica data.
> 
> If you are finding that you've got extra data in some of your replicas from multiple index directories, just reload the collection.  That should get straightened out.
> 
> Thanks,
> Shawn


Re: Index and tlog duplicated, searcher uses old index

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/9/23 01:43, Nick Vladiceanu wrote:
> We noticed that we get inconsistent results for the same query if run multiple times. Out of 4 requests, one of them was returning empty response when we were running “/select?q=id:12345&distrib=true”.
> 
> Started checking each core and we noticed that the core on node-7 had "Last Modified: 9 days ago” (Solr UI -> selected the core -> Overview). On the right side, "Instance details" were showing that we are using “Index: /var/solr/data/collection_0_shard2_replica_t15/data/index.20230530170400660”. Something is wrong.

I suspect that you may have turned off autoCommit.  If so, that's a bad 
idea.

The solrconfig.xml should always have autoCommit configured with a 
relatively short maxTime.  The configs Solr ships with have maxTime set 
to 15000 milliseconds.  In most cases, the autoCommit should have 
openSearcher set to false.  I personally increase the maxTime to 60000 
milliseconds, so there is less overall system load, but the 15 second 
interval in the example configs works very well.  If it didn't, it 
wouldn't be in the example configs.

TLOG and PULL followers query the leader for changes on an interval 
that's half of the autoCommit maxTime setting.

Note that autoCommit serves a very different purpose than 
autoSoftCommit.  If you're going to disable one of them, it should be 
autoSoftCommit.

https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If you haven't disabled autoCommit then I don't know what might be wrong.

You should upgrade to 9.2.1.  The list of bugs fixed between 9.1 and 9.2 
is very extensive, and I have actually run into a number of them on 9.1.1.

I have never understood what makes Solr use an "index.NNNNNNNN" 
directory instead of just "index" or when it switches to a new 
directory.  I know it has something to do with replication, which is the 
Solr feature that SolrCloud uses to copy TLOG/PULL replica data.

If you are finding that you've got extra data in some of your replicas 
from multiple index directories, just reload the collection.  That 
should get straightened out.

Thanks,
Shawn