You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Richard Goodman (Jira)" <ji...@apache.org> on 2020/03/19 16:58:00 UTC

[jira] [Comment Edited] (SOLR-14325) Core status could be improved to not require an IndexSearcher

    [ https://issues.apache.org/jira/browse/SOLR-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062758#comment-17062758 ] 

Richard Goodman edited comment on SOLR-14325 at 3/19/20, 4:57 PM:
------------------------------------------------------------------

Hey David, 

I hope to bring good news to you today, I tried this on a cluster to the same scale as our live production clusters _(luckily enough we got the headroom to do this at the moment)_. And after applying your suggestion of using the {{core.getIndexReaderFactory}} there is significant improvements in results.

The tests I did were;
* Restarting an instance to check recovery and core admin details
* Shutting down an instance, writing 100-200k documents to the collection, increasing the numDocs on the remaining replicas for that given collection+shard , started back up the instance to check recovery.
* Did a restore of a backup _(as this also previously failed with the original patch)_

The following 3 tests worked well, and there were 0 errors in the logs _(where as previously we were getting them)_, as an extra bonus. I also then ran the solr prometheus exporter collecting jetty and core level metrics _(as previously this would also fail for us)_, and they are collecting metrics with 0 error as well.

I have the following 3 data dumps just to also show that {{indexInfo}} is now populating, and gets updated as well which I thought was pretty awesome:

{code:title=core-recovering-out-of-sync-num-docs.json}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":82787,
      "lastPublished":"recovering",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection_",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1634772,
        "maxDoc":1645736,
        "deletedDocs":10964,
        "indexHeapUsageBytes":-1,
        "version":6723987,
        "segmentCount":26,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cf",
        "segmentsFileSizeInBytes":1937,
        "userData":{
          "commitCommandVer":"1661596698074415104",
          "commitTimeMSec":"1584622095179"},
        "lastModified":"2020-03-19T12:48:15.179Z",
        "sizeInBytes":3969833777,
        "size":"3.7 GB"}},
{code}

{code:title=core-recovering-getting-in-sync-with-num-docs.json}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":84929,
      "lastPublished":"recovering",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection_",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1714179,
        "maxDoc":1725282,
        "deletedDocs":11103,
        "indexHeapUsageBytes":-1,
        "version":6724780,
        "segmentCount":25,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cg",
        "segmentsFileSizeInBytes":1868,
        "userData":{
          "commitCommandVer":"1661597662835638272",
          "commitTimeMSec":"1584623015247"},
        "lastModified":"2020-03-19T13:03:35.247Z",
        "sizeInBytes":4858114166,
        "size":"4.52 GB"}},
{code}

{code:title=core-now-active-with-in-sync-numDocs.json}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":93454,
      "lastPublished":"active",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1714179,
        "maxDoc":1725282,
        "deletedDocs":11103,
        "indexHeapUsageBytes":-1,
        "version":6724780,
        "segmentCount":25,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cg",
        "segmentsFileSizeInBytes":1868,
        "userData":{
          "commitCommandVer":"1661597662835638272",
          "commitTimeMSec":"1584623015247"},
        "lastModified":"2020-03-19T13:03:35.247Z",
        "sizeInBytes":4858114166,
        "size":"4.52 GB"}},
{code}

---

I've updated the patch and uploaded, and open to any further comments on it


was (Author: goodman):
Hey David, 

I hope to bring good news to you today, I tried this on a cluster to the same scale as our live production clusters _(luckily enough we got the headroom to do this at the moment)_. And after applying your suggestion of using the {{core.getIndexReaderFactory}} there is significant improvements in results.

The tests I did were;
* Restarting an instance to check recovery and core admin details
* Shutting down an instance, writing 100-200k documents to the collection, increasing the numDocs on the remaining replicas for that given collection+shard , started back up the instance to check recovery.
* Did a restore of a backup _(as this also previously failed with the original patch)_

The following 3 tests worked well, and there were 0 errors in the logs _(where as previously we were getting them)_, as an extra bonus. I also then ran the solr prometheus exporter collecting jetty and core level metrics _(as previously this would also fail for us)_, and they are collecting metrics with 0 error as well.

I have the following 3 data dumps just to also show that {{indexInfo}} is now populating, and gets updated as well which I thought was pretty awesome:

{code:title=core-recovering-out-of-sync-num-docs.json}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":82787,
      "lastPublished":"recovering",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection_",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1634772,
        "maxDoc":1645736,
        "deletedDocs":10964,
        "indexHeapUsageBytes":-1,
        "version":6723987,
        "segmentCount":26,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cf",
        "segmentsFileSizeInBytes":1937,
        "userData":{
          "commitCommandVer":"1661596698074415104",
          "commitTimeMSec":"1584622095179"},
        "lastModified":"2020-03-19T12:48:15.179Z",
        "sizeInBytes":3969833777,
        "size":"3.7 GB"}},
{code}

{code:title=core-recovering-getting-in-sync-with-num-docs}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":84929,
      "lastPublished":"recovering",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection_",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1714179,
        "maxDoc":1725282,
        "deletedDocs":11103,
        "indexHeapUsageBytes":-1,
        "version":6724780,
        "segmentCount":25,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cg",
        "segmentsFileSizeInBytes":1868,
        "userData":{
          "commitCommandVer":"1661597662835638272",
          "commitTimeMSec":"1584623015247"},
        "lastModified":"2020-03-19T13:03:35.247Z",
        "sizeInBytes":4858114166,
        "size":"4.52 GB"}},
{code}

{code:title=core-now-active-with-in-sync-numDocs.json}
"a_collection_shard14_replica_n73":{
      "name":"a_collection_shard14_replica_n73",
      "instanceDir":"/data/solr/data/a_collection_shard14_replica_n73",
      "dataDir":"/data/solr/data/a_collection_shard14_replica_n73/data/",
      "config":"solrconfig.xml",
      "schema":"schema.xml",
      "startTime":"2020-03-19T13:02:11.849Z",
      "uptime":93454,
      "lastPublished":"active",
      "configVersion":0,
      "cloud":{
        "collection":"a_collection",
        "shard":"shard14",
        "replica":"core_node74"},
      "index":{
        "numDocs":1714179,
        "maxDoc":1725282,
        "deletedDocs":11103,
        "indexHeapUsageBytes":-1,
        "version":6724780,
        "segmentCount":25,
        "current":true,
        "hasDeletions":true,
        "directory":"org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/data/solr/data/a_collection_shard14_replica_n73/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8cf74fe; maxCacheMB=48.0 maxMergeSizeMB=4.0)",
        "segmentsFile":"segments_24cg",
        "segmentsFileSizeInBytes":1868,
        "userData":{
          "commitCommandVer":"1661597662835638272",
          "commitTimeMSec":"1584623015247"},
        "lastModified":"2020-03-19T13:03:35.247Z",
        "sizeInBytes":4858114166,
        "size":"4.52 GB"}},
{code}

---

I've updated the patch and uploaded, and open to any further comments on it

> Core status could be improved to not require an IndexSearcher
> -------------------------------------------------------------
>
>                 Key: SOLR-14325
>                 URL: https://issues.apache.org/jira/browse/SOLR-14325
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: SOLR-14325.patch, SOLR-14325.patch
>
>
> When the core status is told to request "indexInfo", it currently grabs the SolrIndexSearcher but only to grab the Directory.  SolrCore.getIndexSize also only requires the Directory.  By insisting on a SolrIndexSearcher, we potentially block for awhile if the core is in recovery since there is no SolrIndexSearcher.
> [https://lists.apache.org/thread.html/r076218c964e9bd6ed0a53133be9170c3cf36cc874c1b4652120db417%40%3Cdev.lucene.apache.org%3E]
> It'd be nice to have a solution that conditionally used the Directory of the SolrIndexSearcher only if it's present so that we don't waste time creating one either.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org