You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Richard Goodman (Jira)" <ji...@apache.org> on 2020/03/19 16:58:00 UTC
[jira] [Comment Edited] (SOLR-14325) Core status could be improved
to not require an IndexSearcher
[ https://issues.apache.org/jira/browse/SOLR-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062758#comment-17062758 ]
Richard Goodman edited comment on SOLR-14325 at 3/19/20, 4:57 PM:
------------------------------------------------------------------
Hey David,
I hope to bring good news to you today, I tried this on a cluster to the same scale as our live production clusters _(luckily enough we got the headroom to do this at the moment)_. And after applying your suggestion of using the {{core.getIndexReaderFactory}} there is significant improvements in results.
The tests I did were;
* Restarting an instance to check recovery and core admin details
* Shutting down an instance, writing 100-200k documents to the collection, increasing the numDocs on the remaining replicas for that given collection+shard , started back up the instance to check recovery.
* Did a restore of a backup _(as this also previously failed with the original patch)_
The following 3 tests worked well, and there were 0 errors in the logs _(where as previously we were getting them)_, as an extra bonus. I also then ran the solr prometheus exporter collecting jetty and core level metrics _(as previously this would also fail for us)_, and they are collecting metrics with 0 error as well.
I have the following 3 data dumps just to also show that {{indexInfo}} is now populating, and gets updated as well which I thought was pretty awesome:
{code:title=core-recovering-out-of-sync-num-docs.json}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":82787,
"lastPublished":"recovering",
"configVersion":0,
"cloud":{
"collection":"a_collection_",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1634772,
"maxDoc":1645736,
"deletedDocs":10964,
"indexHeapUsageBytes":-1,
"version":6723987,
"segmentCount":26,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cf",
"segmentsFileSizeInBytes":1937,
"userData":{
"commitCommandVer":"1661596698074415104",
"commitTimeMSec":"1584622095179"},
"lastModified":"2020-03-19T12:48:15.179Z",
"sizeInBytes":3969833777,
"size":"3.7 GB"}},
{code}
{code:title=core-recovering-getting-in-sync-with-num-docs.json}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":84929,
"lastPublished":"recovering",
"configVersion":0,
"cloud":{
"collection":"a_collection_",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1714179,
"maxDoc":1725282,
"deletedDocs":11103,
"indexHeapUsageBytes":-1,
"version":6724780,
"segmentCount":25,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cg",
"segmentsFileSizeInBytes":1868,
"userData":{
"commitCommandVer":"1661597662835638272",
"commitTimeMSec":"1584623015247"},
"lastModified":"2020-03-19T13:03:35.247Z",
"sizeInBytes":4858114166,
"size":"4.52 GB"}},
{code}
{code:title=core-now-active-with-in-sync-numDocs.json}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":93454,
"lastPublished":"active",
"configVersion":0,
"cloud":{
"collection":"a_collection",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1714179,
"maxDoc":1725282,
"deletedDocs":11103,
"indexHeapUsageBytes":-1,
"version":6724780,
"segmentCount":25,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cg",
"segmentsFileSizeInBytes":1868,
"userData":{
"commitCommandVer":"1661597662835638272",
"commitTimeMSec":"1584623015247"},
"lastModified":"2020-03-19T13:03:35.247Z",
"sizeInBytes":4858114166,
"size":"4.52 GB"}},
{code}
---
I've updated the patch and uploaded, and open to any further comments on it
was (Author: goodman):
Hey David,
I hope to bring good news to you today, I tried this on a cluster to the same scale as our live production clusters _(luckily enough we got the headroom to do this at the moment)_. And after applying your suggestion of using the {{core.getIndexReaderFactory}} there is significant improvements in results.
The tests I did were;
* Restarting an instance to check recovery and core admin details
* Shutting down an instance, writing 100-200k documents to the collection, increasing the numDocs on the remaining replicas for that given collection+shard , started back up the instance to check recovery.
* Did a restore of a backup _(as this also previously failed with the original patch)_
The following 3 tests worked well, and there were 0 errors in the logs _(where as previously we were getting them)_, as an extra bonus. I also then ran the solr prometheus exporter collecting jetty and core level metrics _(as previously this would also fail for us)_, and they are collecting metrics with 0 error as well.
I have the following 3 data dumps just to also show that {{indexInfo}} is now populating, and gets updated as well which I thought was pretty awesome:
{code:title=core-recovering-out-of-sync-num-docs.json}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":82787,
"lastPublished":"recovering",
"configVersion":0,
"cloud":{
"collection":"a_collection_",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1634772,
"maxDoc":1645736,
"deletedDocs":10964,
"indexHeapUsageBytes":-1,
"version":6723987,
"segmentCount":26,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cf",
"segmentsFileSizeInBytes":1937,
"userData":{
"commitCommandVer":"1661596698074415104",
"commitTimeMSec":"1584622095179"},
"lastModified":"2020-03-19T12:48:15.179Z",
"sizeInBytes":3969833777,
"size":"3.7 GB"}},
{code}
{code:title=core-recovering-getting-in-sync-with-num-docs}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":84929,
"lastPublished":"recovering",
"configVersion":0,
"cloud":{
"collection":"a_collection_",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1714179,
"maxDoc":1725282,
"deletedDocs":11103,
"indexHeapUsageBytes":-1,
"version":6724780,
"segmentCount":25,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cg",
"segmentsFileSizeInBytes":1868,
"userData":{
"commitCommandVer":"1661597662835638272",
"commitTimeMSec":"1584623015247"},
"lastModified":"2020-03-19T13:03:35.247Z",
"sizeInBytes":4858114166,
"size":"4.52 GB"}},
{code}
{code:title=core-now-active-with-in-sync-numDocs.json}
"a_collection_shard14_replica_n73":{
"name":"a_collection_shard14_replica_n73",
"instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
"dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
"config":"solrconfig.xml",
"schema":"schema.xml",
"startTime":"2020-03-19T13:02:11.849Z",
"uptime":93454,
"lastPublished":"active",
"configVersion":0,
"cloud":{
"collection":"a_collection",
"shard":"shard14",
"replica":"core_node74"},
"index":{
"numDocs":1714179,
"maxDoc":1725282,
"deletedDocs":11103,
"indexHeapUsageBytes":-1,
"version":6724780,
"segmentCount":25,
"current":true,
"hasDeletions":true,
"directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
"segmentsFile":"segments_24cg",
"segmentsFileSizeInBytes":1868,
"userData":{
"commitCommandVer":"1661597662835638272",
"commitTimeMSec":"1584623015247"},
"lastModified":"2020-03-19T13:03:35.247Z",
"sizeInBytes":4858114166,
"size":"4.52 GB"}},
{code}
---
I've updated the patch and uploaded, and open to any further comments on it
> Core status could be improved to not require an IndexSearcher
> -------------------------------------------------------------
>
> Key: SOLR-14325
> URL: https://issues.apache.org/jira/browse/SOLR-14325
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Priority: Major
> Attachments: SOLR-14325.patch, SOLR-14325.patch
>
>
> When the core status is told to request "indexInfo", it currently grabs the SolrIndexSearcher but only to grab the Directory. SolrCore.getIndexSize also only requires the Directory. By insisting on a SolrIndexSearcher, we potentially block for awhile if the core is in recovery since there is no SolrIndexSearcher.
> [https://lists.apache.org/thread.html/r076218c964e9bd6ed0a53133be9170c3cf36cc874c1b4652120db417%40%3Cdev.lucene.apache.org%3E]
> It'd be nice to have a solution that conditionally used the Directory of the SolrIndexSearcher only if it's present so that we don't waste time creating one either.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org