You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vishal patel <vi...@outlook.com> on 2020/07/06 13:41:25 UTC

Replica goes into recovery mode in Solr 6.1.0

I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.

*Our commit configuration in solr.config are below
<autoCommit>
<maxTime>600000</maxTime>
       <maxDocs>20000</maxDocs>
       <openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>

*We used Near Real Time Searching So we did below configuration in solr.in.cmd
set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100

*Our collections details are below:

Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
collection1     26913364        201     26913379        202     26913380        198     26913379        198
collection2     13934360        310     13934367        310     13934368        219     13934367        219
collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2

*My server configurations are below:

        Server1 Server2
CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
Total memory(GB)        320     320
Shard1 Allocated memory(GB)     55
Shard2 Replica Allocated memory(GB)     55
Shard2 Allocated memory(GB)             55
Shard1 Replica Allocated memory(GB)             55
Other Applications Allocated Memory(GB) 60      22
Other Number Of Applications    11      7


Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
Should we increase the shard for recovery issue?

Regards,
Vishal Patel

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Actually, I have showed our collection details in Excel format but may be formatting is removed here.

For this you can see https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view

Regards,
Vishal Patel
________________________________
From: Rodrigo Oliveira <ad...@gmail.com>
Sent: Wednesday, July 8, 2020 4:23 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

Hi,

How do you show this? Command for this resume?


*Our collections details are below:

Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents     Size(GB)        Number of Documents     Size(GB)
    Number of Documents     Size(GB)        Number of Documents     Size(GB)
collection1     26913364        201     26913379        202     26913380
    198     26913379        198
collection2     13934360        310     13934367        310     13934368
    219     13934367        219
collection3     351539689       73.5    351540040       73.5    351540136
     75.2    351539722



Em seg, 6 de jul de 2020 10:41, vishal patel <vi...@outlook.com>
escreveu:

> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We
> have 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and
> Update requests are coming frequently in our live platform.
>
> *Our commit configuration in solr.config are below
> <autoCommit>
> <maxTime>600000</maxTime>
>        <maxDocs>20000</maxDocs>
>        <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
> </autoSoftCommit>
>
> *We used Near Real Time Searching So we did below configuration in
> solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>
> *Our collections details are below:
>
> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents     Size(GB)        Number of Documents     Size(GB)
>       Number of Documents     Size(GB)        Number of Documents
>  Size(GB)
> collection1     26913364        201     26913379        202     26913380
>       198     26913379        198
> collection2     13934360        310     13934367        310     13934368
>       219     13934367        219
> collection3     351539689       73.5    351540040       73.5    351540136
>      75.2    351539722       75.2
>
> *My server configurations are below:
>
>         Server1 Server2
> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s),
> 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz,
> 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)        320     320
> Shard1 Allocated memory(GB)     55
> Shard2 Replica Allocated memory(GB)     55
> Shard2 Allocated memory(GB)             55
> Shard1 Replica Allocated memory(GB)             55
> Other Applications Allocated Memory(GB) 60      22
> Other Number Of Applications    11      7
>
>
> Sometimes, any one replica goes into recovery mode. Why replica goes into
> recovery? Due to heavy search OR heavy update/insert OR long GC pause time?
> If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
>
> Regards,
> Vishal Patel
>
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Rodrigo Oliveira <ad...@gmail.com>.

Hi,

How do you show this? Command for this resume?


*Our collections details are below:

Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents     Size(GB)        Number of Documents     Size(GB)
    Number of Documents     Size(GB)        Number of Documents     Size(GB)
collection1     26913364        201     26913379        202     26913380
    198     26913379        198
collection2     13934360        310     13934367        310     13934368
    219     13934367        219
collection3     351539689       73.5    351540040       73.5    351540136
     75.2    351539722



Em seg, 6 de jul de 2020 10:41, vishal patel <vi...@outlook.com>
escreveu:

> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We
> have 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and
> Update requests are coming frequently in our live platform.
>
> *Our commit configuration in solr.config are below
> <autoCommit>
> <maxTime>600000</maxTime>
>        <maxDocs>20000</maxDocs>
>        <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
> </autoSoftCommit>
>
> *We used Near Real Time Searching So we did below configuration in
> solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>
> *Our collections details are below:
>
> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents     Size(GB)        Number of Documents     Size(GB)
>       Number of Documents     Size(GB)        Number of Documents
>  Size(GB)
> collection1     26913364        201     26913379        202     26913380
>       198     26913379        198
> collection2     13934360        310     13934367        310     13934368
>       219     13934367        219
> collection3     351539689       73.5    351540040       73.5    351540136
>      75.2    351539722       75.2
>
> *My server configurations are below:
>
>         Server1 Server2
> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s),
> 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz,
> 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)        320     320
> Shard1 Allocated memory(GB)     55
> Shard2 Replica Allocated memory(GB)     55
> Shard2 Allocated memory(GB)             55
> Shard1 Replica Allocated memory(GB)             55
> Other Applications Allocated Memory(GB) 60      22
> Other Number Of Applications    11      7
>
>
> Sometimes, any one replica goes into recovery mode. Why replica goes into
> recovery? Due to heavy search OR heavy update/insert OR long GC pause time?
> If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
>
> Regards,
> Vishal Patel
>
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Thanks for your reply.

When I searched my error "org.apache.http.NoHttpResponseException:  failed to respond" in Google, I found the one Solr jira case : https://issues.apache.org/jira/browse/SOLR-7483.  I saw a comment of @Erick Erickson<ma...@gmail.com>.
is this issue resolved? Can I get that jira case?
[SOLR-7483] Investigate ways to deal with the tlog growing indefinitely while it's being replayed - ASF JIRA<https://issues.apache.org/jira/browse/SOLR-7483>
WARN - 2015-04-28 21:38:43.345; [ ] org.apache.solr.handler.IndexFetcher; File _xv.si did not match. expected checksum is 617655777 and actual is checksum 1090588695. expected length is 419 and actual length is 419 WARN - 2015-04-28 21:38:43.349; [ ] org.apache.solr.handler.IndexFetcher; File _xv.fnm did not match. expected checksum is 1992662616 and actual is checksum 1632122630. expected ...
issues.apache.org

In this same error which I got.
My Error Log:
shard: https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
replica: https://drive.google.com/file/d/1y0fC_n5u3MBMQbXrvxtqaD8vBBXDLR6I/view

Regards,
Vishal Patel
________________________________
From: Walter Underwood <wu...@wunderwood.org>
Sent: Friday, July 10, 2020 11:15 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

Sorting and faceting takes a lot of memory. From your charts, I would try
a 31 GB heap. That would make GC faster. 680 ms is very long for a GC
and can cause problems.

Combine a 680 ms GC with a 100 ms soft commit time and you can have
lots of trouble.

Change your soft commit time to 10000 (ten seconds) or longer.

Look at a 24 hour graph of heap usage. It should look like a sawtooth,
increasing, then dropping after every full GC. The bottom of the sawtooth
is the the memory that Solr actually needs. Take the highest number from
the bottom of the sawtooth and add some extra, maybe 2 GB. Try that
heap size.

Upgrade to 6.6.2. That includes all bug fixes for the 6.x release. The 6.x
release had several bad bugs, especially in the middle releases. We were
switching prod to Sol Cloud while those were being released and it was
not fun.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 10, 2020, at 4:59 AM, vishal patel <vi...@outlook.com> wrote:
>
> Thanks for quick reply.
>
> I assume caches (are they too large?), perhaps uninverted indexes.
> Docvalues would help with latter ones. Do you use them?
>>> We do not use any cache. we disabled the cache from solrconfig.xml
> here is my solrconfig .xml and schema.xml
> https://drive.google.com/file/d/12SHl3YGP7jT4goikBkeyB2s1NX5_C2gz/view
> https://drive.google.com/file/d/1LwA1d4OiMhQQv806tR0HbZoEjA8IyfdR/view
>
> We used Docvalues on that field which is used for sorting or faceting.
>
> You could also try upgrading to the latest version in 6.x series as a starter.
>>> I will surely try.
>
> So, the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery.
>>> There is no error before 14:42:43.943 just search and insert requests are there. I got that node is responding but why it is not responding? Due to lack of memory or any other cause
> why we cannot get idea from log for reason of not responding.
>
> Is there any monitor for Solr from where we can find the root cause?
>
> Regards,
> Vishal Patel
>
>
> ________________________________
> From: Ere Maijala <er...@helsinki.fi>
> Sent: Friday, July 10, 2020 4:27 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>
> vishal patel kirjoitti 10.7.2020 klo 12.45:
>> Thanks for your input.
>>
>> Walter already said that setting soft commit max time to 100 ms is a recipe for disaster
>>>> I know that but our application is already developed and run on live environment since last 5 years. Actually, we want to show a data very quickly after the insert.
>>
>> you have huge JVM heaps without an explanation for the reason
>>>> We gave the 55GB ram because our usage is like that large query search and very frequent searching and indexing.
>> Here is my memory snapshot which I have taken from GC.
>
> Yes, I can see that a lot of memory is in use, but the question is why.
> I assume caches (are they too large?), perhaps uninverted indexes.
> Docvalues would help with latter ones. Do you use them?
>
>> I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot do. I have also asked in here
>> https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562
>
> You could also try upgrading to the latest version in 6.x series as a
> starter.
>
>> Why we cannot find the reason of recovery from log? like memory or CPU issue, frequent index or search, large query hit,
>> My log at the time of recovery
>> https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
>> [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
>> recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
>> drive.google.com
>
> Isn't it right there on the first lines?
>
> 2020-07-09 14:42:43.943 ERROR
> (updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products
> x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products)
> [c:products s:shard1 r:core_node1 x:products]
> o.a.s.u.StreamingSolrClients error
> org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to
> respond
>
> followed by a couple more error messages about the same problem and then
> initiation of recovery:
>
> 2020-07-09 14:42:44.002 INFO  (qtp1239731077-771611) [c:products
> s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica
> core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into
> leader-initiated recovery.
>
> So the node in question isn't responding quickly enough to http requests
> and gets put into recovery. The log for the recovering node starts too
> late, so I can't say anything about what happened before 14:42:43.943
> that lead to recovery.
>
> --Ere
>
>>
>> ________________________________
>> From: Ere Maijala <er...@helsinki.fi>
>> Sent: Friday, July 10, 2020 2:10 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> Walter already said that setting soft commit max time to 100 ms is a
>> recipe for disaster. That alone can be the issue, but if you're not
>> willing to try higher values, there's no way of being sure. And you have
>> huge JVM heaps without an explanation for the reason. If those do not
>> cause problems, you indicated that you also run some other software on
>> the same server. Is it possible that the other processes hog CPU, disk
>> or network and starve Solr?
>>
>> I must add that Solr 6.1.0 is over four years old. You could be hitting
>> a bug that has been fixed for years, but even if you encounter an issue
>> that's still present, you will need to uprgade to get it fixed. If you
>> look at the number of fixes done in subsequent 6.x versions alone in the
>> changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
>> you'll see that there are a lot of them. You could be hitting something
>> like SOLR-10420, which has been fixed for over three years.
>>
>> Best,
>> Ere
>>
>> vishal patel kirjoitti 10.7.2020 klo 7.52:
>>> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
>>>
>>> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
>>> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
>>>
>>> Regards,
>>> Vishal Patel
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Friday, July 10, 2020 3:03 AM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> Those are extremely large JVMs. Unless you have proven that you MUST
>>> have 55 GB of heap, use a smaller heap.
>>>
>>> I’ve been running Solr for a dozen years and I’ve never needed a heap
>>> larger than 8 GB.
>>>
>>> Also, there is usually no need to use one JVM per replica.
>>>
>>> Your configuration is using 110 GB (two JVMs) just for Java
>>> where I would configure it with a single 8 GB JVM. That would
>>> free up 100 GB for file caches.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>>>
>>>> Thanks for reply.
>>>>
>>>> what you mean by "Shard1 Allocated memory”
>>>>>> It means JVM memory of one solr node or instance.
>>>>
>>>> How many Solr JVMs are you running?
>>>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>>>
>>>> What is the heap size for your JVMs?
>>>>>> 55GB of one Solr JVM.
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>> ________________________________
>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>> Sent: Wednesday, July 8, 2020 8:45 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>>>> any way to dedicate system RAM to an application object like a replica.
>>>>
>>>> How many Solr JVMs are you running?
>>>>
>>>> What is the heap size for your JVMs?
>>>>
>>>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>>>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>>>
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>>
>>>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>>>
>>>>> Thanks for your reply.
>>>>>
>>>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>>>
>>>>> Proper formatting details:
>>>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>>>
>>>>> Are you running multiple huge JVMs?
>>>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>>>
>>>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>>>
>>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>>>> Our requirement is NRT so we keep the less time
>>>>>
>>>>> Regards,
>>>>> Vishal Patel
>>>>> ________________________________
>>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>>>
>>>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>>>
>>>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>>>> that RAM is used by JVMs or some other process, I guess. Are you
>>>>> running multiple huge JVMs?
>>>>>
>>>>> The servers will be doing a LOT of disk IO, so look at the read and
>>>>> write iops. I expect that the solr processes are blocked on disk reads
>>>>> almost all the time.
>>>>>
>>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>> That is probably causing your outages.
>>>>>
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>
>>>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>>>
>>>>>> Any one is looking my issue? Please guide me.
>>>>>>
>>>>>> Regards,
>>>>>> Vishal Patel
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: vishal patel <vi...@outlook.com>
>>>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>>>
>>>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>>>
>>>>>> *Our commit configuration in solr.config are below
>>>>>> <autoCommit>
>>>>>> <maxTime>600000</maxTime>
>>>>>>    <maxDocs>20000</maxDocs>
>>>>>>    <openSearcher>false</openSearcher>
>>>>>> </autoCommit>
>>>>>> <autoSoftCommit>
>>>>>>    <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>>>> </autoSoftCommit>
>>>>>>
>>>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>>>
>>>>>> *Our collections details are below:
>>>>>>
>>>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>>>
>>>>>> *My server configurations are below:
>>>>>>
>>>>>>     Server1 Server2
>>>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>>>> Total memory(GB)        320     320
>>>>>> Shard1 Allocated memory(GB)     55
>>>>>> Shard2 Replica Allocated memory(GB)     55
>>>>>> Shard2 Allocated memory(GB)             55
>>>>>> Shard1 Replica Allocated memory(GB)             55
>>>>>> Other Applications Allocated Memory(GB) 60      22
>>>>>> Other Number Of Applications    11      7
>>>>>>
>>>>>>
>>>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>>>> Should we increase the shard for recovery issue?
>>>>>>
>>>>>> Regards,
>>>>>> Vishal Patel
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Walter Underwood <wu...@wunderwood.org>.

Sorting and faceting takes a lot of memory. From your charts, I would try
a 31 GB heap. That would make GC faster. 680 ms is very long for a GC
and can cause problems.

Combine a 680 ms GC with a 100 ms soft commit time and you can have
lots of trouble.

Change your soft commit time to 10000 (ten seconds) or longer.

Look at a 24 hour graph of heap usage. It should look like a sawtooth,
increasing, then dropping after every full GC. The bottom of the sawtooth
is the the memory that Solr actually needs. Take the highest number from
the bottom of the sawtooth and add some extra, maybe 2 GB. Try that
heap size.

Upgrade to 6.6.2. That includes all bug fixes for the 6.x release. The 6.x 
release had several bad bugs, especially in the middle releases. We were
switching prod to Sol Cloud while those were being released and it was
not fun.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 10, 2020, at 4:59 AM, vishal patel <vi...@outlook.com> wrote:
> 
> Thanks for quick reply.
> 
> I assume caches (are they too large?), perhaps uninverted indexes.
> Docvalues would help with latter ones. Do you use them?
>>> We do not use any cache. we disabled the cache from solrconfig.xml
> here is my solrconfig .xml and schema.xml
> https://drive.google.com/file/d/12SHl3YGP7jT4goikBkeyB2s1NX5_C2gz/view
> https://drive.google.com/file/d/1LwA1d4OiMhQQv806tR0HbZoEjA8IyfdR/view
> 
> We used Docvalues on that field which is used for sorting or faceting.
> 
> You could also try upgrading to the latest version in 6.x series as a starter.
>>> I will surely try.
> 
> So, the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery.
>>> There is no error before 14:42:43.943 just search and insert requests are there. I got that node is responding but why it is not responding? Due to lack of memory or any other cause
> why we cannot get idea from log for reason of not responding.
> 
> Is there any monitor for Solr from where we can find the root cause?
> 
> Regards,
> Vishal Patel
> 
> 
> ________________________________
> From: Ere Maijala <er...@helsinki.fi>
> Sent: Friday, July 10, 2020 4:27 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> vishal patel kirjoitti 10.7.2020 klo 12.45:
>> Thanks for your input.
>> 
>> Walter already said that setting soft commit max time to 100 ms is a recipe for disaster
>>>> I know that but our application is already developed and run on live environment since last 5 years. Actually, we want to show a data very quickly after the insert.
>> 
>> you have huge JVM heaps without an explanation for the reason
>>>> We gave the 55GB ram because our usage is like that large query search and very frequent searching and indexing.
>> Here is my memory snapshot which I have taken from GC.
> 
> Yes, I can see that a lot of memory is in use, but the question is why.
> I assume caches (are they too large?), perhaps uninverted indexes.
> Docvalues would help with latter ones. Do you use them?
> 
>> I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot do. I have also asked in here
>> https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562
> 
> You could also try upgrading to the latest version in 6.x series as a
> starter.
> 
>> Why we cannot find the reason of recovery from log? like memory or CPU issue, frequent index or search, large query hit,
>> My log at the time of recovery
>> https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
>> [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
>> recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
>> drive.google.com
> 
> Isn't it right there on the first lines?
> 
> 2020-07-09 14:42:43.943 ERROR
> (updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products
> x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products)
> [c:products s:shard1 r:core_node1 x:products]
> o.a.s.u.StreamingSolrClients error
> org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to
> respond
> 
> followed by a couple more error messages about the same problem and then
> initiation of recovery:
> 
> 2020-07-09 14:42:44.002 INFO  (qtp1239731077-771611) [c:products
> s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica
> core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into
> leader-initiated recovery.
> 
> So the node in question isn't responding quickly enough to http requests
> and gets put into recovery. The log for the recovering node starts too
> late, so I can't say anything about what happened before 14:42:43.943
> that lead to recovery.
> 
> --Ere
> 
>> 
>> ________________________________
>> From: Ere Maijala <er...@helsinki.fi>
>> Sent: Friday, July 10, 2020 2:10 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>> 
>> Walter already said that setting soft commit max time to 100 ms is a
>> recipe for disaster. That alone can be the issue, but if you're not
>> willing to try higher values, there's no way of being sure. And you have
>> huge JVM heaps without an explanation for the reason. If those do not
>> cause problems, you indicated that you also run some other software on
>> the same server. Is it possible that the other processes hog CPU, disk
>> or network and starve Solr?
>> 
>> I must add that Solr 6.1.0 is over four years old. You could be hitting
>> a bug that has been fixed for years, but even if you encounter an issue
>> that's still present, you will need to uprgade to get it fixed. If you
>> look at the number of fixes done in subsequent 6.x versions alone in the
>> changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
>> you'll see that there are a lot of them. You could be hitting something
>> like SOLR-10420, which has been fixed for over three years.
>> 
>> Best,
>> Ere
>> 
>> vishal patel kirjoitti 10.7.2020 klo 7.52:
>>> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
>>> 
>>> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
>>> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
>>> 
>>> Regards,
>>> Vishal Patel
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Friday, July 10, 2020 3:03 AM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>> 
>>> Those are extremely large JVMs. Unless you have proven that you MUST
>>> have 55 GB of heap, use a smaller heap.
>>> 
>>> I’ve been running Solr for a dozen years and I’ve never needed a heap
>>> larger than 8 GB.
>>> 
>>> Also, there is usually no need to use one JVM per replica.
>>> 
>>> Your configuration is using 110 GB (two JVMs) just for Java
>>> where I would configure it with a single 8 GB JVM. That would
>>> free up 100 GB for file caches.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>>> 
>>>> Thanks for reply.
>>>> 
>>>> what you mean by "Shard1 Allocated memory”
>>>>>> It means JVM memory of one solr node or instance.
>>>> 
>>>> How many Solr JVMs are you running?
>>>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>>> 
>>>> What is the heap size for your JVMs?
>>>>>> 55GB of one Solr JVM.
>>>> 
>>>> Regards,
>>>> Vishal Patel
>>>> 
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>> ________________________________
>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>> Sent: Wednesday, July 8, 2020 8:45 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>> 
>>>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>>>> any way to dedicate system RAM to an application object like a replica.
>>>> 
>>>> How many Solr JVMs are you running?
>>>> 
>>>> What is the heap size for your JVMs?
>>>> 
>>>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>>>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>>> 
>>>>> Thanks for your reply.
>>>>> 
>>>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>>> 
>>>>> Proper formatting details:
>>>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>>> 
>>>>> Are you running multiple huge JVMs?
>>>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>>> 
>>>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>>> 
>>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>>>> Our requirement is NRT so we keep the less time
>>>>> 
>>>>> Regards,
>>>>> Vishal Patel
>>>>> ________________________________
>>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>>> 
>>>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>>> 
>>>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>>>> that RAM is used by JVMs or some other process, I guess. Are you
>>>>> running multiple huge JVMs?
>>>>> 
>>>>> The servers will be doing a LOT of disk IO, so look at the read and
>>>>> write iops. I expect that the solr processes are blocked on disk reads
>>>>> almost all the time.
>>>>> 
>>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>> That is probably causing your outages.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org
>>>>> http://observer.wunderwood.org/  (my blog)
>>>>> 
>>>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>>> 
>>>>>> Any one is looking my issue? Please guide me.
>>>>>> 
>>>>>> Regards,
>>>>>> Vishal Patel
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> From: vishal patel <vi...@outlook.com>
>>>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>>> 
>>>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>>> 
>>>>>> *Our commit configuration in solr.config are below
>>>>>> <autoCommit>
>>>>>> <maxTime>600000</maxTime>
>>>>>>    <maxDocs>20000</maxDocs>
>>>>>>    <openSearcher>false</openSearcher>
>>>>>> </autoCommit>
>>>>>> <autoSoftCommit>
>>>>>>    <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>>>> </autoSoftCommit>
>>>>>> 
>>>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>>> 
>>>>>> *Our collections details are below:
>>>>>> 
>>>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>>> 
>>>>>> *My server configurations are below:
>>>>>> 
>>>>>>     Server1 Server2
>>>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>>>> Total memory(GB)        320     320
>>>>>> Shard1 Allocated memory(GB)     55
>>>>>> Shard2 Replica Allocated memory(GB)     55
>>>>>> Shard2 Allocated memory(GB)             55
>>>>>> Shard1 Replica Allocated memory(GB)             55
>>>>>> Other Applications Allocated Memory(GB) 60      22
>>>>>> Other Number Of Applications    11      7
>>>>>> 
>>>>>> 
>>>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>>>> Should we increase the shard for recovery issue?
>>>>>> 
>>>>>> Regards,
>>>>>> Vishal Patel
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>> 
> 
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Thanks for quick reply.

I assume caches (are they too large?), perhaps uninverted indexes.
Docvalues would help with latter ones. Do you use them?
>> We do not use any cache. we disabled the cache from solrconfig.xml
here is my solrconfig .xml and schema.xml
https://drive.google.com/file/d/12SHl3YGP7jT4goikBkeyB2s1NX5_C2gz/view
https://drive.google.com/file/d/1LwA1d4OiMhQQv806tR0HbZoEjA8IyfdR/view

We used Docvalues on that field which is used for sorting or faceting.

You could also try upgrading to the latest version in 6.x series as a starter.
>> I will surely try.

So, the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery.
>> There is no error before 14:42:43.943 just search and insert requests are there. I got that node is responding but why it is not responding? Due to lack of memory or any other cause
why we cannot get idea from log for reason of not responding.

Is there any monitor for Solr from where we can find the root cause?

Regards,
Vishal Patel


________________________________
From: Ere Maijala <er...@helsinki.fi>
Sent: Friday, July 10, 2020 4:27 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

vishal patel kirjoitti 10.7.2020 klo 12.45:
> Thanks for your input.
>
> Walter already said that setting soft commit max time to 100 ms is a recipe for disaster
>>> I know that but our application is already developed and run on live environment since last 5 years. Actually, we want to show a data very quickly after the insert.
>
> you have huge JVM heaps without an explanation for the reason
>>> We gave the 55GB ram because our usage is like that large query search and very frequent searching and indexing.
> Here is my memory snapshot which I have taken from GC.

Yes, I can see that a lot of memory is in use, but the question is why.
I assume caches (are they too large?), perhaps uninverted indexes.
Docvalues would help with latter ones. Do you use them?

> I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot do. I have also asked in here
> https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562

You could also try upgrading to the latest version in 6.x series as a
starter.

> Why we cannot find the reason of recovery from log? like memory or CPU issue, frequent index or search, large query hit,
> My log at the time of recovery
> https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
> [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
> recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
> drive.google.com

Isn't it right there on the first lines?

2020-07-09 14:42:43.943 ERROR
(updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products
x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products)
[c:products s:shard1 r:core_node1 x:products]
o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to
respond

followed by a couple more error messages about the same problem and then
initiation of recovery:

2020-07-09 14:42:44.002 INFO  (qtp1239731077-771611) [c:products
s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica
core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into
leader-initiated recovery.

So the node in question isn't responding quickly enough to http requests
and gets put into recovery. The log for the recovering node starts too
late, so I can't say anything about what happened before 14:42:43.943
that lead to recovery.

--Ere

>
> ________________________________
> From: Ere Maijala <er...@helsinki.fi>
> Sent: Friday, July 10, 2020 2:10 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>
> Walter already said that setting soft commit max time to 100 ms is a
> recipe for disaster. That alone can be the issue, but if you're not
> willing to try higher values, there's no way of being sure. And you have
> huge JVM heaps without an explanation for the reason. If those do not
> cause problems, you indicated that you also run some other software on
> the same server. Is it possible that the other processes hog CPU, disk
> or network and starve Solr?
>
> I must add that Solr 6.1.0 is over four years old. You could be hitting
> a bug that has been fixed for years, but even if you encounter an issue
> that's still present, you will need to uprgade to get it fixed. If you
> look at the number of fixes done in subsequent 6.x versions alone in the
> changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
> you'll see that there are a lot of them. You could be hitting something
> like SOLR-10420, which has been fixed for over three years.
>
> Best,
> Ere
>
> vishal patel kirjoitti 10.7.2020 klo 7.52:
>> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
>>
>> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
>> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
>>
>> Regards,
>> Vishal Patel
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Friday, July 10, 2020 3:03 AM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> Those are extremely large JVMs. Unless you have proven that you MUST
>> have 55 GB of heap, use a smaller heap.
>>
>> I’ve been running Solr for a dozen years and I’ve never needed a heap
>> larger than 8 GB.
>>
>> Also, there is usually no need to use one JVM per replica.
>>
>> Your configuration is using 110 GB (two JVMs) just for Java
>> where I would configure it with a single 8 GB JVM. That would
>> free up 100 GB for file caches.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>>
>>> Thanks for reply.
>>>
>>> what you mean by "Shard1 Allocated memory”
>>>>> It means JVM memory of one solr node or instance.
>>>
>>> How many Solr JVMs are you running?
>>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>>
>>> What is the heap size for your JVMs?
>>>>> 55GB of one Solr JVM.
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Wednesday, July 8, 2020 8:45 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>>> any way to dedicate system RAM to an application object like a replica.
>>>
>>> How many Solr JVMs are you running?
>>>
>>> What is the heap size for your JVMs?
>>>
>>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>>
>>>> Thanks for your reply.
>>>>
>>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>>
>>>> Proper formatting details:
>>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>>
>>>> Are you running multiple huge JVMs?
>>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>>
>>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>>
>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>>> Our requirement is NRT so we keep the less time
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>> ________________________________
>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>>
>>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>>> that RAM is used by JVMs or some other process, I guess. Are you
>>>> running multiple huge JVMs?
>>>>
>>>> The servers will be doing a LOT of disk IO, so look at the read and
>>>> write iops. I expect that the solr processes are blocked on disk reads
>>>> almost all the time.
>>>>
>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>> That is probably causing your outages.
>>>>
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>>
>>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>>
>>>>> Any one is looking my issue? Please guide me.
>>>>>
>>>>> Regards,
>>>>> Vishal Patel
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: vishal patel <vi...@outlook.com>
>>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>>
>>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>>
>>>>> *Our commit configuration in solr.config are below
>>>>> <autoCommit>
>>>>> <maxTime>600000</maxTime>
>>>>>     <maxDocs>20000</maxDocs>
>>>>>     <openSearcher>false</openSearcher>
>>>>> </autoCommit>
>>>>> <autoSoftCommit>
>>>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>>> </autoSoftCommit>
>>>>>
>>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>>
>>>>> *Our collections details are below:
>>>>>
>>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>>
>>>>> *My server configurations are below:
>>>>>
>>>>>      Server1 Server2
>>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>>> Total memory(GB)        320     320
>>>>> Shard1 Allocated memory(GB)     55
>>>>> Shard2 Replica Allocated memory(GB)     55
>>>>> Shard2 Allocated memory(GB)             55
>>>>> Shard1 Replica Allocated memory(GB)             55
>>>>> Other Applications Allocated Memory(GB) 60      22
>>>>> Other Number Of Applications    11      7
>>>>>
>>>>>
>>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>>> Should we increase the shard for recovery issue?
>>>>>
>>>>> Regards,
>>>>> Vishal Patel
>>>>>
>>>>
>>>
>>
>>
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Ere Maijala <er...@helsinki.fi>.

vishal patel kirjoitti 10.7.2020 klo 12.45:
> Thanks for your input.
> 
> Walter already said that setting soft commit max time to 100 ms is a recipe for disaster
>>> I know that but our application is already developed and run on live environment since last 5 years. Actually, we want to show a data very quickly after the insert.
> 
> you have huge JVM heaps without an explanation for the reason
>>> We gave the 55GB ram because our usage is like that large query search and very frequent searching and indexing.
> Here is my memory snapshot which I have taken from GC.

Yes, I can see that a lot of memory is in use, but the question is why.
I assume caches (are they too large?), perhaps uninverted indexes.
Docvalues would help with latter ones. Do you use them?

> I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot do. I have also asked in here
> https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562

You could also try upgrading to the latest version in 6.x series as a
starter.

> Why we cannot find the reason of recovery from log? like memory or CPU issue, frequent index or search, large query hit,
> My log at the time of recovery
> https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
> [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
> recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
> drive.google.com

Isn't it right there on the first lines?

2020-07-09 14:42:43.943 ERROR
(updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products
x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products)
[c:products s:shard1 r:core_node1 x:products]
o.a.s.u.StreamingSolrClients error
org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to
respond

followed by a couple more error messages about the same problem and then
initiation of recovery:

2020-07-09 14:42:44.002 INFO  (qtp1239731077-771611) [c:products
s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica
core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into
leader-initiated recovery.

So the node in question isn't responding quickly enough to http requests
and gets put into recovery. The log for the recovering node starts too
late, so I can't say anything about what happened before 14:42:43.943
that lead to recovery.

--Ere

> 
> ________________________________
> From: Ere Maijala <er...@helsinki.fi>
> Sent: Friday, July 10, 2020 2:10 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> Walter already said that setting soft commit max time to 100 ms is a
> recipe for disaster. That alone can be the issue, but if you're not
> willing to try higher values, there's no way of being sure. And you have
> huge JVM heaps without an explanation for the reason. If those do not
> cause problems, you indicated that you also run some other software on
> the same server. Is it possible that the other processes hog CPU, disk
> or network and starve Solr?
> 
> I must add that Solr 6.1.0 is over four years old. You could be hitting
> a bug that has been fixed for years, but even if you encounter an issue
> that's still present, you will need to uprgade to get it fixed. If you
> look at the number of fixes done in subsequent 6.x versions alone in the
> changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
> you'll see that there are a lot of them. You could be hitting something
> like SOLR-10420, which has been fixed for over three years.
> 
> Best,
> Ere
> 
> vishal patel kirjoitti 10.7.2020 klo 7.52:
>> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
>>
>> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
>> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
>>
>> Regards,
>> Vishal Patel
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Friday, July 10, 2020 3:03 AM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> Those are extremely large JVMs. Unless you have proven that you MUST
>> have 55 GB of heap, use a smaller heap.
>>
>> I’ve been running Solr for a dozen years and I’ve never needed a heap
>> larger than 8 GB.
>>
>> Also, there is usually no need to use one JVM per replica.
>>
>> Your configuration is using 110 GB (two JVMs) just for Java
>> where I would configure it with a single 8 GB JVM. That would
>> free up 100 GB for file caches.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>>
>>> Thanks for reply.
>>>
>>> what you mean by "Shard1 Allocated memory”
>>>>> It means JVM memory of one solr node or instance.
>>>
>>> How many Solr JVMs are you running?
>>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>>
>>> What is the heap size for your JVMs?
>>>>> 55GB of one Solr JVM.
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Wednesday, July 8, 2020 8:45 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>>> any way to dedicate system RAM to an application object like a replica.
>>>
>>> How many Solr JVMs are you running?
>>>
>>> What is the heap size for your JVMs?
>>>
>>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>>
>>>> Thanks for your reply.
>>>>
>>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>>
>>>> Proper formatting details:
>>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>>
>>>> Are you running multiple huge JVMs?
>>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>>
>>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>>
>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>>> Our requirement is NRT so we keep the less time
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>> ________________________________
>>>> From: Walter Underwood <wu...@wunderwood.org>
>>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>>
>>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>>> that RAM is used by JVMs or some other process, I guess. Are you
>>>> running multiple huge JVMs?
>>>>
>>>> The servers will be doing a LOT of disk IO, so look at the read and
>>>> write iops. I expect that the solr processes are blocked on disk reads
>>>> almost all the time.
>>>>
>>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>> That is probably causing your outages.
>>>>
>>>> wunder
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>>
>>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>>
>>>>> Any one is looking my issue? Please guide me.
>>>>>
>>>>> Regards,
>>>>> Vishal Patel
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: vishal patel <vi...@outlook.com>
>>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>>
>>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>>
>>>>> *Our commit configuration in solr.config are below
>>>>> <autoCommit>
>>>>> <maxTime>600000</maxTime>
>>>>>     <maxDocs>20000</maxDocs>
>>>>>     <openSearcher>false</openSearcher>
>>>>> </autoCommit>
>>>>> <autoSoftCommit>
>>>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>>> </autoSoftCommit>
>>>>>
>>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>>
>>>>> *Our collections details are below:
>>>>>
>>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>>
>>>>> *My server configurations are below:
>>>>>
>>>>>      Server1 Server2
>>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>>> Total memory(GB)        320     320
>>>>> Shard1 Allocated memory(GB)     55
>>>>> Shard2 Replica Allocated memory(GB)     55
>>>>> Shard2 Allocated memory(GB)             55
>>>>> Shard1 Replica Allocated memory(GB)             55
>>>>> Other Applications Allocated Memory(GB) 60      22
>>>>> Other Number Of Applications    11      7
>>>>>
>>>>>
>>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>>> Should we increase the shard for recovery issue?
>>>>>
>>>>> Regards,
>>>>> Vishal Patel
>>>>>
>>>>
>>>
>>
>>
> 
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Thanks for your input.

Walter already said that setting soft commit max time to 100 ms is a recipe for disaster
>> I know that but our application is already developed and run on live environment since last 5 years. Actually, we want to show a data very quickly after the insert.

you have huge JVM heaps without an explanation for the reason
>> We gave the 55GB ram because our usage is like that large query search and very frequent searching and indexing.
Here is my memory snapshot which I have taken from GC.

https://drive.google.com/file/d/1WPYqg-wPFGnnMu8FopXs4EAGAgSq8ZEG/view
[https://lh6.googleusercontent.com/INm-eNbRs_A9CuCjQcxOyoHlX_gmRHHu7FeyMbU1Mj3rOj3UHjYbn0j9tIk8TuM=w1200-h630-p]<https://drive.google.com/file/d/1WPYqg-wPFGnnMu8FopXs4EAGAgSq8ZEG/view>
heapusage_before_gc.PNG<https://drive.google.com/file/d/1WPYqg-wPFGnnMu8FopXs4EAGAgSq8ZEG/view>
drive.google.com


https://drive.google.com/file/d/1LYEdcY9Om_0u8ltIHikU7hsuuKYQPh_m/view
[https://lh6.googleusercontent.com/hzRu4UiUAALHmoZuNi2pmNX-M4W2Um7EL67ee6qn0X_F1hpzlx2DVwGndvQ3K2Y=w1200-h630-p]<https://drive.google.com/file/d/1LYEdcY9Om_0u8ltIHikU7hsuuKYQPh_m/view>
JVM_memory.PNG<https://drive.google.com/file/d/1LYEdcY9Om_0u8ltIHikU7hsuuKYQPh_m/view>
drive.google.com



you indicated that you also run some other software on the same server. Is it possible that the other processes hog CPU, disk or network and starve Solr?
>> I will check that

I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot do. I have also asked in here
https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562

https://lucene.472066.n3.nabble.com/Query-takes-more-time-in-Solr-8-5-1-compare-to-6-1-0-version-td4458153.html


Why we cannot find the reason of recovery from log? like memory or CPU issue, frequent index or search, large query hit,
My log at the time of recovery
https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view
[https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view>
drive.google.com


https://drive.google.com/file/d/1y0fC_n5u3MBMQbXrvxtqaD8vBBXDLR6I/view
[https://lh4.googleusercontent.com/WtJhD6JBgBDxbT-hEp59mGl82Z0OIR0CseEKphLm7PGAPwOGB2EXNhe0Dfa5t6E=w1200-h630-p]<https://drive.google.com/file/d/1y0fC_n5u3MBMQbXrvxtqaD8vBBXDLR6I/view>
recovery_replica.txt<https://drive.google.com/file/d/1y0fC_n5u3MBMQbXrvxtqaD8vBBXDLR6I/view>
drive.google.com

Regards,
Vishal Patel


________________________________
From: Ere Maijala <er...@helsinki.fi>
Sent: Friday, July 10, 2020 2:10 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

Walter already said that setting soft commit max time to 100 ms is a
recipe for disaster. That alone can be the issue, but if you're not
willing to try higher values, there's no way of being sure. And you have
huge JVM heaps without an explanation for the reason. If those do not
cause problems, you indicated that you also run some other software on
the same server. Is it possible that the other processes hog CPU, disk
or network and starve Solr?

I must add that Solr 6.1.0 is over four years old. You could be hitting
a bug that has been fixed for years, but even if you encounter an issue
that's still present, you will need to uprgade to get it fixed. If you
look at the number of fixes done in subsequent 6.x versions alone in the
changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
you'll see that there are a lot of them. You could be hitting something
like SOLR-10420, which has been fixed for over three years.

Best,
Ere

vishal patel kirjoitti 10.7.2020 klo 7.52:
> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
>
> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
>
> Regards,
> Vishal Patel
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Friday, July 10, 2020 3:03 AM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>
> Those are extremely large JVMs. Unless you have proven that you MUST
> have 55 GB of heap, use a smaller heap.
>
> I’ve been running Solr for a dozen years and I’ve never needed a heap
> larger than 8 GB.
>
> Also, there is usually no need to use one JVM per replica.
>
> Your configuration is using 110 GB (two JVMs) just for Java
> where I would configure it with a single 8 GB JVM. That would
> free up 100 GB for file caches.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>
>> Thanks for reply.
>>
>> what you mean by "Shard1 Allocated memory”
>>>> It means JVM memory of one solr node or instance.
>>
>> How many Solr JVMs are you running?
>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>
>> What is the heap size for your JVMs?
>>>> 55GB of one Solr JVM.
>>
>> Regards,
>> Vishal Patel
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Wednesday, July 8, 2020 8:45 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>> any way to dedicate system RAM to an application object like a replica.
>>
>> How many Solr JVMs are you running?
>>
>> What is the heap size for your JVMs?
>>
>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>
>>> Thanks for your reply.
>>>
>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>
>>> Proper formatting details:
>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>
>>> Are you running multiple huge JVMs?
>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>> Our requirement is NRT so we keep the less time
>>>
>>> Regards,
>>> Vishal Patel
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>
>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>> that RAM is used by JVMs or some other process, I guess. Are you
>>> running multiple huge JVMs?
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and
>>> write iops. I expect that the solr processes are blocked on disk reads
>>> almost all the time.
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>> That is probably causing your outages.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>
>>>> Any one is looking my issue? Please guide me.
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>>
>>>> ________________________________
>>>> From: vishal patel <vi...@outlook.com>
>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>
>>>> *Our commit configuration in solr.config are below
>>>> <autoCommit>
>>>> <maxTime>600000</maxTime>
>>>>     <maxDocs>20000</maxDocs>
>>>>     <openSearcher>false</openSearcher>
>>>> </autoCommit>
>>>> <autoSoftCommit>
>>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>> </autoSoftCommit>
>>>>
>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>
>>>> *Our collections details are below:
>>>>
>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>
>>>> *My server configurations are below:
>>>>
>>>>      Server1 Server2
>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>> Total memory(GB)        320     320
>>>> Shard1 Allocated memory(GB)     55
>>>> Shard2 Replica Allocated memory(GB)     55
>>>> Shard2 Allocated memory(GB)             55
>>>> Shard1 Replica Allocated memory(GB)             55
>>>> Other Applications Allocated Memory(GB) 60      22
>>>> Other Number Of Applications    11      7
>>>>
>>>>
>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>> Should we increase the shard for recovery issue?
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>
>>
>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Ere Maijala <er...@helsinki.fi>.

Walter already said that setting soft commit max time to 100 ms is a
recipe for disaster. That alone can be the issue, but if you're not
willing to try higher values, there's no way of being sure. And you have
huge JVM heaps without an explanation for the reason. If those do not
cause problems, you indicated that you also run some other software on
the same server. Is it possible that the other processes hog CPU, disk
or network and starve Solr?

I must add that Solr 6.1.0 is over four years old. You could be hitting
a bug that has been fixed for years, but even if you encounter an issue
that's still present, you will need to uprgade to get it fixed. If you
look at the number of fixes done in subsequent 6.x versions alone in the
changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html)
you'll see that there are a lot of them. You could be hitting something
like SOLR-10420, which has been fixed for over three years.

Best,
Ere

vishal patel kirjoitti 10.7.2020 klo 7.52:
> I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?
> 
> My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
> Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.
> 
> Regards,
> Vishal Patel
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Friday, July 10, 2020 3:03 AM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> Those are extremely large JVMs. Unless you have proven that you MUST
> have 55 GB of heap, use a smaller heap.
> 
> I’ve been running Solr for a dozen years and I’ve never needed a heap
> larger than 8 GB.
> 
> Also, there is usually no need to use one JVM per replica.
> 
> Your configuration is using 110 GB (two JVMs) just for Java
> where I would configure it with a single 8 GB JVM. That would
> free up 100 GB for file caches.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>>
>> Thanks for reply.
>>
>> what you mean by "Shard1 Allocated memory”
>>>> It means JVM memory of one solr node or instance.
>>
>> How many Solr JVMs are you running?
>>>> In one server 2 solr JVMs in which one is shard and other is replica.
>>
>> What is the heap size for your JVMs?
>>>> 55GB of one Solr JVM.
>>
>> Regards,
>> Vishal Patel
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Wednesday, July 8, 2020 8:45 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
>> any way to dedicate system RAM to an application object like a replica.
>>
>> How many Solr JVMs are you running?
>>
>> What is the heap size for your JVMs?
>>
>> Setting soft commit max time to 100 ms does not magically make Solr super fast.
>> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>>
>>> Thanks for your reply.
>>>
>>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>>
>>> Proper formatting details:
>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>>
>>> Are you running multiple huge JVMs?
>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>>> Our requirement is NRT so we keep the less time
>>>
>>> Regards,
>>> Vishal Patel
>>> ________________________________
>>> From: Walter Underwood <wu...@wunderwood.org>
>>> Sent: Tuesday, July 7, 2020 8:15 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>>
>>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>>> that RAM is used by JVMs or some other process, I guess. Are you
>>> running multiple huge JVMs?
>>>
>>> The servers will be doing a LOT of disk IO, so look at the read and
>>> write iops. I expect that the solr processes are blocked on disk reads
>>> almost all the time.
>>>
>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>> That is probably causing your outages.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>>
>>>> Any one is looking my issue? Please guide me.
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>>
>>>> ________________________________
>>>> From: vishal patel <vi...@outlook.com>
>>>> Sent: Monday, July 6, 2020 7:11 PM
>>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>>
>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>>
>>>> *Our commit configuration in solr.config are below
>>>> <autoCommit>
>>>> <maxTime>600000</maxTime>
>>>>     <maxDocs>20000</maxDocs>
>>>>     <openSearcher>false</openSearcher>
>>>> </autoCommit>
>>>> <autoSoftCommit>
>>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>>> </autoSoftCommit>
>>>>
>>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>>
>>>> *Our collections details are below:
>>>>
>>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>>
>>>> *My server configurations are below:
>>>>
>>>>      Server1 Server2
>>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>>> Total memory(GB)        320     320
>>>> Shard1 Allocated memory(GB)     55
>>>> Shard2 Replica Allocated memory(GB)     55
>>>> Shard2 Allocated memory(GB)             55
>>>> Shard1 Replica Allocated memory(GB)             55
>>>> Other Applications Allocated Memory(GB) 60      22
>>>> Other Number Of Applications    11      7
>>>>
>>>>
>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>>> Should we increase the shard for recovery issue?
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>
>>
> 
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

I’ve been running Solr for a dozen years and I’ve never needed a heap larger than 8 GB.
>> What is your data size? same like us 1 TB? is your searching or indexing frequently? NRT model?

My question is why replica is going into recovery? When replica went down, I checked GC log but GC pause was not more than 2 seconds.
Also, I cannot find out any reason for recovery from Solr log file. i want to know the reason why replica goes into recovery.

Regards,
Vishal Patel
________________________________
From: Walter Underwood <wu...@wunderwood.org>
Sent: Friday, July 10, 2020 3:03 AM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

Those are extremely large JVMs. Unless you have proven that you MUST
have 55 GB of heap, use a smaller heap.

I’ve been running Solr for a dozen years and I’ve never needed a heap
larger than 8 GB.

Also, there is usually no need to use one JVM per replica.

Your configuration is using 110 GB (two JVMs) just for Java
where I would configure it with a single 8 GB JVM. That would
free up 100 GB for file caches.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
>
> Thanks for reply.
>
> what you mean by "Shard1 Allocated memory”
>>> It means JVM memory of one solr node or instance.
>
> How many Solr JVMs are you running?
>>> In one server 2 solr JVMs in which one is shard and other is replica.
>
> What is the heap size for your JVMs?
>>> 55GB of one Solr JVM.
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Wednesday, July 8, 2020 8:45 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>
> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
> any way to dedicate system RAM to an application object like a replica.
>
> How many Solr JVMs are you running?
>
> What is the heap size for your JVMs?
>
> Setting soft commit max time to 100 ms does not magically make Solr super fast.
> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>>
>> Thanks for your reply.
>>
>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>>
>> Proper formatting details:
>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>>
>> Are you running multiple huge JVMs?
>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>>
>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>>
>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>> Our requirement is NRT so we keep the less time
>>
>> Regards,
>> Vishal Patel
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Tuesday, July 7, 2020 8:15 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>>
>> This isn’t a support list, so nobody looks at issues. We do try to help.
>>
>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>> that RAM is used by JVMs or some other process, I guess. Are you
>> running multiple huge JVMs?
>>
>> The servers will be doing a LOT of disk IO, so look at the read and
>> write iops. I expect that the solr processes are blocked on disk reads
>> almost all the time.
>>
>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>> That is probably causing your outages.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>>
>>> Any one is looking my issue? Please guide me.
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>>
>>> ________________________________
>>> From: vishal patel <vi...@outlook.com>
>>> Sent: Monday, July 6, 2020 7:11 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>>
>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>>
>>> *Our commit configuration in solr.config are below
>>> <autoCommit>
>>> <maxTime>600000</maxTime>
>>>     <maxDocs>20000</maxDocs>
>>>     <openSearcher>false</openSearcher>
>>> </autoCommit>
>>> <autoSoftCommit>
>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>> </autoSoftCommit>
>>>
>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>>
>>> *Our collections details are below:
>>>
>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>>
>>> *My server configurations are below:
>>>
>>>      Server1 Server2
>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>> Total memory(GB)        320     320
>>> Shard1 Allocated memory(GB)     55
>>> Shard2 Replica Allocated memory(GB)     55
>>> Shard2 Allocated memory(GB)             55
>>> Shard1 Replica Allocated memory(GB)             55
>>> Other Applications Allocated Memory(GB) 60      22
>>> Other Number Of Applications    11      7
>>>
>>>
>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>> Should we increase the shard for recovery issue?
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Walter Underwood <wu...@wunderwood.org>.

Those are extremely large JVMs. Unless you have proven that you MUST
have 55 GB of heap, use a smaller heap.

I’ve been running Solr for a dozen years and I’ve never needed a heap
larger than 8 GB.

Also, there is usually no need to use one JVM per replica.

Your configuration is using 110 GB (two JVMs) just for Java
where I would configure it with a single 8 GB JVM. That would
free up 100 GB for file caches.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 8, 2020, at 10:10 PM, vishal patel <vi...@outlook.com> wrote:
> 
> Thanks for reply.
> 
> what you mean by "Shard1 Allocated memory”
>>> It means JVM memory of one solr node or instance.
> 
> How many Solr JVMs are you running?
>>> In one server 2 solr JVMs in which one is shard and other is replica.
> 
> What is the heap size for your JVMs?
>>> 55GB of one Solr JVM.
> 
> Regards,
> Vishal Patel
> 
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Wednesday, July 8, 2020 8:45 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
> any way to dedicate system RAM to an application object like a replica.
> 
> How many Solr JVMs are you running?
> 
> What is the heap size for your JVMs?
> 
> Setting soft commit max time to 100 ms does not magically make Solr super fast.
> It makes Solr do too much work, makes the work queues fill up, and makes it fail.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>> 
>> Thanks for your reply.
>> 
>> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>> 
>> Proper formatting details:
>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>> 
>> Are you running multiple huge JVMs?
>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>> 
>> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>>> is it chance to go in recovery mode if more IO read and write or blocked?
>> 
>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>>> Our requirement is NRT so we keep the less time
>> 
>> Regards,
>> Vishal Patel
>> ________________________________
>> From: Walter Underwood <wu...@wunderwood.org>
>> Sent: Tuesday, July 7, 2020 8:15 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>> 
>> This isn’t a support list, so nobody looks at issues. We do try to help.
>> 
>> It looks like you have 1 TB of index on a system with 320 GB of RAM.
>> I don’t know what "Shard1 Allocated memory” is, but maybe half of
>> that RAM is used by JVMs or some other process, I guess. Are you
>> running multiple huge JVMs?
>> 
>> The servers will be doing a LOT of disk IO, so look at the read and
>> write iops. I expect that the solr processes are blocked on disk reads
>> almost all the time.
>> 
>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>> That is probably causing your outages.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>> 
>>> Any one is looking my issue? Please guide me.
>>> 
>>> Regards,
>>> Vishal Patel
>>> 
>>> 
>>> ________________________________
>>> From: vishal patel <vi...@outlook.com>
>>> Sent: Monday, July 6, 2020 7:11 PM
>>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>> 
>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>> 
>>> *Our commit configuration in solr.config are below
>>> <autoCommit>
>>> <maxTime>600000</maxTime>
>>>     <maxDocs>20000</maxDocs>
>>>     <openSearcher>false</openSearcher>
>>> </autoCommit>
>>> <autoSoftCommit>
>>>     <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>>> </autoSoftCommit>
>>> 
>>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>> 
>>> *Our collections details are below:
>>> 
>>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>> 
>>> *My server configurations are below:
>>> 
>>>      Server1 Server2
>>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>>> Total memory(GB)        320     320
>>> Shard1 Allocated memory(GB)     55
>>> Shard2 Replica Allocated memory(GB)     55
>>> Shard2 Allocated memory(GB)             55
>>> Shard1 Replica Allocated memory(GB)             55
>>> Other Applications Allocated Memory(GB) 60      22
>>> Other Number Of Applications    11      7
>>> 
>>> 
>>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>>> Should we increase the shard for recovery issue?
>>> 
>>> Regards,
>>> Vishal Patel
>>> 
>> 
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Thanks for reply.

what you mean by "Shard1 Allocated memory”
>> It means JVM memory of one solr node or instance.

How many Solr JVMs are you running?
>> In one server 2 solr JVMs in which one is shard and other is replica.

What is the heap size for your JVMs?
>> 55GB of one Solr JVM.

Regards,
Vishal Patel

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Walter Underwood <wu...@wunderwood.org>
Sent: Wednesday, July 8, 2020 8:45 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
any way to dedicate system RAM to an application object like a replica.

How many Solr JVMs are you running?

What is the heap size for your JVMs?

Setting soft commit max time to 100 ms does not magically make Solr super fast.
It makes Solr do too much work, makes the work queues fill up, and makes it fail.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
>
> Thanks for your reply.
>
> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
>
> Proper formatting details:
> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
>
> Are you running multiple huge JVMs?
>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
>
> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>> is it chance to go in recovery mode if more IO read and write or blocked?
>
> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>> Our requirement is NRT so we keep the less time
>
> Regards,
> Vishal Patel
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Tuesday, July 7, 2020 8:15 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
>
> This isn’t a support list, so nobody looks at issues. We do try to help.
>
> It looks like you have 1 TB of index on a system with 320 GB of RAM.
> I don’t know what "Shard1 Allocated memory” is, but maybe half of
> that RAM is used by JVMs or some other process, I guess. Are you
> running multiple huge JVMs?
>
> The servers will be doing a LOT of disk IO, so look at the read and
> write iops. I expect that the solr processes are blocked on disk reads
> almost all the time.
>
> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
> That is probably causing your outages.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>>
>> Any one is looking my issue? Please guide me.
>>
>> Regards,
>> Vishal Patel
>>
>>
>> ________________________________
>> From: vishal patel <vi...@outlook.com>
>> Sent: Monday, July 6, 2020 7:11 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Replica goes into recovery mode in Solr 6.1.0
>>
>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>>
>> *Our commit configuration in solr.config are below
>> <autoCommit>
>> <maxTime>600000</maxTime>
>>      <maxDocs>20000</maxDocs>
>>      <openSearcher>false</openSearcher>
>> </autoCommit>
>> <autoSoftCommit>
>>      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>> </autoSoftCommit>
>>
>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>>
>> *Our collections details are below:
>>
>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>>
>> *My server configurations are below:
>>
>>       Server1 Server2
>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>> Total memory(GB)        320     320
>> Shard1 Allocated memory(GB)     55
>> Shard2 Replica Allocated memory(GB)     55
>> Shard2 Allocated memory(GB)             55
>> Shard1 Replica Allocated memory(GB)             55
>> Other Applications Allocated Memory(GB) 60      22
>> Other Number Of Applications    11      7
>>
>>
>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>> Should we increase the shard for recovery issue?
>>
>> Regards,
>> Vishal Patel
>>
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Walter Underwood <wu...@wunderwood.org>.

I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know of
any way to dedicate system RAM to an application object like a replica.

How many Solr JVMs are you running?

What is the heap size for your JVMs?

Setting soft commit max time to 100 ms does not magically make Solr super fast.
It makes Solr do too much work, makes the work queues fill up, and makes it fail.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 10:55 PM, vishal patel <vi...@outlook.com> wrote:
> 
> Thanks for your reply.
> 
> One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.
> 
> Proper formatting details:
> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view
> 
> Are you running multiple huge JVMs?
>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.
> 
> The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>>> is it chance to go in recovery mode if more IO read and write or blocked?
> 
> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>>> Our requirement is NRT so we keep the less time
> 
> Regards,
> Vishal Patel
> ________________________________
> From: Walter Underwood <wu...@wunderwood.org>
> Sent: Tuesday, July 7, 2020 8:15 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Replica goes into recovery mode in Solr 6.1.0
> 
> This isn’t a support list, so nobody looks at issues. We do try to help.
> 
> It looks like you have 1 TB of index on a system with 320 GB of RAM.
> I don’t know what "Shard1 Allocated memory” is, but maybe half of
> that RAM is used by JVMs or some other process, I guess. Are you
> running multiple huge JVMs?
> 
> The servers will be doing a LOT of disk IO, so look at the read and
> write iops. I expect that the solr processes are blocked on disk reads
> almost all the time.
> 
> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
> That is probably causing your outages.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>> 
>> Any one is looking my issue? Please guide me.
>> 
>> Regards,
>> Vishal Patel
>> 
>> 
>> ________________________________
>> From: vishal patel <vi...@outlook.com>
>> Sent: Monday, July 6, 2020 7:11 PM
>> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
>> Subject: Replica goes into recovery mode in Solr 6.1.0
>> 
>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
>> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>> 
>> *Our commit configuration in solr.config are below
>> <autoCommit>
>> <maxTime>600000</maxTime>
>>      <maxDocs>20000</maxDocs>
>>      <openSearcher>false</openSearcher>
>> </autoCommit>
>> <autoSoftCommit>
>>      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>> </autoSoftCommit>
>> 
>> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>> 
>> *Our collections details are below:
>> 
>> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
>> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
>> collection1     26913364        201     26913379        202     26913380        198     26913379        198
>> collection2     13934360        310     13934367        310     13934368        219     13934367        219
>> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>> 
>> *My server configurations are below:
>> 
>>       Server1 Server2
>> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
>> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
>> Total memory(GB)        320     320
>> Shard1 Allocated memory(GB)     55
>> Shard2 Replica Allocated memory(GB)     55
>> Shard2 Allocated memory(GB)             55
>> Shard1 Replica Allocated memory(GB)             55
>> Other Applications Allocated Memory(GB) 60      22
>> Other Number Of Applications    11      7
>> 
>> 
>> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
>> Should we increase the shard for recovery issue?
>> 
>> Regards,
>> Vishal Patel
>> 
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Thanks for your reply.

One server has total 320GB ram. In this 2 solr node one is shard1 and second is shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB data in this server. server has also other applications and for that 60GB memory allocated. So total 150GB memory is left.

Proper formatting details:
https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view

Are you running multiple huge JVMs?
>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are still free.

The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time.
>> is it chance to go in recovery mode if more IO read and write or blocked?

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>> Our requirement is NRT so we keep the less time

Regards,
Vishal Patel
________________________________
From: Walter Underwood <wu...@wunderwood.org>
Sent: Tuesday, July 7, 2020 8:15 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

This isn’t a support list, so nobody looks at issues. We do try to help.

It looks like you have 1 TB of index on a system with 320 GB of RAM.
I don’t know what "Shard1 Allocated memory” is, but maybe half of
that RAM is used by JVMs or some other process, I guess. Are you
running multiple huge JVMs?

The servers will be doing a LOT of disk IO, so look at the read and
write iops. I expect that the solr processes are blocked on disk reads
almost all the time.

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
That is probably causing your outages.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
>
> Any one is looking my issue? Please guide me.
>
> Regards,
> Vishal Patel
>
>
> ________________________________
> From: vishal patel <vi...@outlook.com>
> Sent: Monday, July 6, 2020 7:11 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Replica goes into recovery mode in Solr 6.1.0
>
> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
>
> *Our commit configuration in solr.config are below
> <autoCommit>
> <maxTime>600000</maxTime>
>       <maxDocs>20000</maxDocs>
>       <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
>       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
> </autoSoftCommit>
>
> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>
> *Our collections details are below:
>
> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
> collection1     26913364        201     26913379        202     26913380        198     26913379        198
> collection2     13934360        310     13934367        310     13934368        219     13934367        219
> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
>
> *My server configurations are below:
>
>        Server1 Server2
> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)        320     320
> Shard1 Allocated memory(GB)     55
> Shard2 Replica Allocated memory(GB)     55
> Shard2 Allocated memory(GB)             55
> Shard1 Replica Allocated memory(GB)             55
> Other Applications Allocated Memory(GB) 60      22
> Other Number Of Applications    11      7
>
>
> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
>
> Regards,
> Vishal Patel
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by Walter Underwood <wu...@wunderwood.org>.

This isn’t a support list, so nobody looks at issues. We do try to help.

It looks like you have 1 TB of index on a system with 320 GB of RAM.
I don’t know what "Shard1 Allocated memory” is, but maybe half of
that RAM is used by JVMs or some other process, I guess. Are you
running multiple huge JVMs?

The servers will be doing a LOT of disk IO, so look at the read and
write iops. I expect that the solr processes are blocked on disk reads
almost all the time. 

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). 
That is probably causing your outages.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 5:18 AM, vishal patel <vi...@outlook.com> wrote:
> 
> Any one is looking my issue? Please guide me.
> 
> Regards,
> Vishal Patel
> 
> 
> ________________________________
> From: vishal patel <vi...@outlook.com>
> Sent: Monday, July 6, 2020 7:11 PM
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Replica goes into recovery mode in Solr 6.1.0
> 
> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.
> 
> *Our commit configuration in solr.config are below
> <autoCommit>
> <maxTime>600000</maxTime>
>       <maxDocs>20000</maxDocs>
>       <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
>       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
> </autoSoftCommit>
> 
> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
> 
> *Our collections details are below:
> 
> Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
> collection1     26913364        201     26913379        202     26913380        198     26913379        198
> collection2     13934360        310     13934367        310     13934368        219     13934367        219
> collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2
> 
> *My server configurations are below:
> 
>        Server1 Server2
> CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)        320     320
> Shard1 Allocated memory(GB)     55
> Shard2 Replica Allocated memory(GB)     55
> Shard2 Allocated memory(GB)             55
> Shard1 Replica Allocated memory(GB)             55
> Other Applications Allocated Memory(GB) 60      22
> Other Number Of Applications    11      7
> 
> 
> Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
> 
> Regards,
> Vishal Patel
>

Re: Replica goes into recovery mode in Solr 6.1.0

Posted by vishal patel <vi...@outlook.com>.

Any one is looking my issue? Please guide me.

Regards,
Vishal Patel


________________________________
From: vishal patel <vi...@outlook.com>
Sent: Monday, July 6, 2020 7:11 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Replica goes into recovery mode in Solr 6.1.0

I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 shards and each shard has 1 replica. We have 3 collection.
We do not use any cache and also disable in Solr config.xml. Search and Update requests are coming frequently in our live platform.

*Our commit configuration in solr.config are below
<autoCommit>
<maxTime>600000</maxTime>
       <maxDocs>20000</maxDocs>
       <openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>

*We used Near Real Time Searching So we did below configuration in solr.in.cmd
set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100

*Our collections details are below:

Collection      Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)        Number of Documents     Size(GB)
collection1     26913364        201     26913379        202     26913380        198     26913379        198
collection2     13934360        310     13934367        310     13934368        219     13934367        219
collection3     351539689       73.5    351540040       73.5    351540136       75.2    351539722       75.2

*My server configurations are below:

        Server1 Server2
CPU     Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 Logical Processor(s)
HardDisk(GB)    3845 ( 3.84 TB) 3485 GB (3.48 TB)
Total memory(GB)        320     320
Shard1 Allocated memory(GB)     55
Shard2 Replica Allocated memory(GB)     55
Shard2 Allocated memory(GB)             55
Shard1 Replica Allocated memory(GB)             55
Other Applications Allocated Memory(GB) 60      22
Other Number Of Applications    11      7


Sometimes, any one replica goes into recovery mode. Why replica goes into recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If any one of them then what should we do in configuration?
Should we increase the shard for recovery issue?

Regards,
Vishal Patel