You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by sujatha sankaran <su...@gmail.com> on 2018/06/15 21:14:42 UTC

Delete By Query issue followed by Delete By Id Issues

We were initially having an issue with DBQ and heavy batch updates  which
used to result in many missing updates.



After reading many mails in mailing list which mentions that DBQ and batch
update do not work well together, we switched to DBI. But  we are seeing
issue as mentioned in this jira issue:
https://issues.apache.org/jira/browse/SOLR-7384



Specifically we are seeing a pattern as :-

·        There are several  ERRORs and WARNs about “missing _*version*_”
type of thing.

·        ERROR message is typically single.

·        There are several WARNs after that and after couple of WARNs there
is message that Leader initiated recovery has been kicked off .



Few scenarios:

   - Batch update with DBI where deletes are followed by updates for some
   documents in collection & Batch update with DBQ for some other docs =>
   results in missing docs across both types
   - Batch deletes with DBI   with route parameter, we see that about 20%
   of deletes are not happening. At this point there could be parallel batch
   updates with DBQ/ DBI
   - Pure DBI based updates where deletes are followed by updates , no DBQ
   here , but we are seeing missing version error and Leader initiated
   recovery, but deletes and  updates seem fine for individual docs update,
   yet to test  a batch with heavy load scenario

*Setup info*:

- Solr Cloud 6.6.2
--5 Node, 5 Shard, 3 replica setup
-~35million docs in the collection
-  Nodes have 90GB RAM 32 to JVM
-Soft commit interval 2 seconds, Hard commit (open searcher false) 15
seconds



Are there any solutions to missing version update for DBI followed by LIR
during heavy batch indexing  wehn using custom routing ?


Thanks,
Sujatha

Re: Delete By Query issue followed by Delete By Id Issues

Posted by sujatha sankaran <su...@gmail.com>.

Hi Emir,

We are deleting a larger subset of docs with a particular value which we
know based on the id and only updating a few of the deleted. Our document
is of the form
<type>_<part1>_<part2>, we need to delete all that has the same <part1>,
that are no longer in DB and then update only a few that has been updated
in DB.

Thanks,
Sujatha



On Sun, Jun 24, 2018 at 8:59 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Sujatha,
> Did I get it right that you are deleting the same documents that will be
> updated afterward? If that’s the case, then you can simply skip deleting,
> and just send updated version of document. Solr (Lucene) does not have
> delete - it’ll just flag document as deleted. Updating document (assuming
> id is the same) will result in the same thing - old document will not be
> retrievable and will be removed from index when segments holding it is
> merged.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 21 Jun 2018, at 19:59, sujatha sankaran <su...@gmail.com>
> wrote:
> >
> > Thanks,Shawn.
> >
> > Our use case is something like this in a batch load of  several 1000's of
> > documents,we do a delete first followed by update.Example delete all 1000
> > docs and send an update request for 1000.
> >
> > What we see is that there are many missing docs due to DBQ re-ordering of
> > the order of  deletes followed by updates.We also saw issue with nodes
> > going down
> > similar tot issue described here:
> > http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-
> going-to-recovery-state-during-indexing-td4369396.html
> >
> > we see at the end of this batch process, many (several thousand ) missing
> > docs.
> >
> > Due to this and after reading above thread , we decided to move to DBI
> and
> > now are facing issues due to custom routing or implicit routing which we
> > have in place.So I don't think DBQ was working for us, but we did have
> > several such process ( DBQ followed by updates) for different activities
> in
> > the collection happening at the same time.
> >
> >
> > Sujatha
> >
> > On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> >>> Currently from our business perspective we find that we are left with
> no
> >>> options for deleting docs in a batch load as :
> >>>
> >>> DBQ+ batch does not work well together
> >>> DBI+ custom routing (batch load / normal)    would not work as well.
> >>
> >> I would expect DBQ to work, just with the caveat that if you are trying
> >> to do other indexing operations at the same time, you may run into
> >> significant delays, and if there are timeouts configured anywhere that
> >> are shorter than those delays, requests may return failure responses or
> >> log failures.
> >>
> >> If you are using DBQ, you just need to be sure that there are no other
> >> operations happening at the same time, or that your error handling is
> >> bulletproof.  Making sure that no other operations are happening at the
> >> same time as the DBQ is in my opinion a better option.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Delete By Query issue followed by Delete By Id Issues

Posted by Emir Arnautović <em...@sematext.com>.

Hi Sujatha,
Did I get it right that you are deleting the same documents that will be updated afterward? If that’s the case, then you can simply skip deleting, and just send updated version of document. Solr (Lucene) does not have delete - it’ll just flag document as deleted. Updating document (assuming id is the same) will result in the same thing - old document will not be retrievable and will be removed from index when segments holding it is merged.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Jun 2018, at 19:59, sujatha sankaran <su...@gmail.com> wrote:
> 
> Thanks,Shawn.
> 
> Our use case is something like this in a batch load of  several 1000's of
> documents,we do a delete first followed by update.Example delete all 1000
> docs and send an update request for 1000.
> 
> What we see is that there are many missing docs due to DBQ re-ordering of
> the order of  deletes followed by updates.We also saw issue with nodes
> going down
> similar tot issue described here:
> http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-going-to-recovery-state-during-indexing-td4369396.html
> 
> we see at the end of this batch process, many (several thousand ) missing
> docs.
> 
> Due to this and after reading above thread , we decided to move to DBI and
> now are facing issues due to custom routing or implicit routing which we
> have in place.So I don't think DBQ was working for us, but we did have
> several such process ( DBQ followed by updates) for different activities in
> the collection happening at the same time.
> 
> 
> Sujatha
> 
> On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
>>> Currently from our business perspective we find that we are left with no
>>> options for deleting docs in a batch load as :
>>> 
>>> DBQ+ batch does not work well together
>>> DBI+ custom routing (batch load / normal)    would not work as well.
>> 
>> I would expect DBQ to work, just with the caveat that if you are trying
>> to do other indexing operations at the same time, you may run into
>> significant delays, and if there are timeouts configured anywhere that
>> are shorter than those delays, requests may return failure responses or
>> log failures.
>> 
>> If you are using DBQ, you just need to be sure that there are no other
>> operations happening at the same time, or that your error handling is
>> bulletproof.  Making sure that no other operations are happening at the
>> same time as the DBQ is in my opinion a better option.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Delete By Query issue followed by Delete By Id Issues

Posted by sujatha sankaran <su...@gmail.com>.

Thanks,Shawn.

Our use case is something like this in a batch load of  several 1000's of
documents,we do a delete first followed by update.Example delete all 1000
docs and send an update request for 1000.

What we see is that there are many missing docs due to DBQ re-ordering of
the order of  deletes followed by updates.We also saw issue with nodes
going down
similar tot issue described here:
http://lucene.472066.n3.nabble.com/SolrCloud-Nodes-going-to-recovery-state-during-indexing-td4369396.html

we see at the end of this batch process, many (several thousand ) missing
docs.

Due to this and after reading above thread , we decided to move to DBI and
now are facing issues due to custom routing or implicit routing which we
have in place.So I don't think DBQ was working for us, but we did have
several such process ( DBQ followed by updates) for different activities in
the collection happening at the same time.

Sujatha

On Thu, Jun 21, 2018 at 1:21 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> > Currently from our business perspective we find that we are left with no
> > options for deleting docs in a batch load as :
> >
> > DBQ+ batch does not work well together
> > DBI+ custom routing (batch load / normal)    would not work as well.
>
> I would expect DBQ to work, just with the caveat that if you are trying
> to do other indexing operations at the same time, you may run into
> significant delays, and if there are timeouts configured anywhere that
> are shorter than those delays, requests may return failure responses or
> log failures.
>
> If you are using DBQ, you just need to be sure that there are no other
> operations happening at the same time, or that your error handling is
> bulletproof.  Making sure that no other operations are happening at the
> same time as the DBQ is in my opinion a better option.
>
> Thanks,
> Shawn
>
>

Re: Delete By Query issue followed by Delete By Id Issues

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/21/2018 9:59 AM, sujatha sankaran wrote:
> Currently from our business perspective we find that we are left with no
> options for deleting docs in a batch load as :
>
> DBQ+ batch does not work well together
> DBI+ custom routing (batch load / normal)    would not work as well.

I would expect DBQ to work, just with the caveat that if you are trying
to do other indexing operations at the same time, you may run into
significant delays, and if there are timeouts configured anywhere that
are shorter than those delays, requests may return failure responses or
log failures.

If you are using DBQ, you just need to be sure that there are no other
operations happening at the same time, or that your error handling is
bulletproof.  Making sure that no other operations are happening at the
same time as the DBQ is in my opinion a better option.

Thanks,
Shawn

Re: Delete By Query issue followed by Delete By Id Issues

Posted by sujatha sankaran <su...@gmail.com>.

Thanks,Shawn.

Currently from our business perspective we find that we are left with no
options for deleting docs in a batch load as :

DBQ+ batch does not work well together
DBI+ custom routing (batch load / normal)    would not work as well.

We are not sure how we can proceed unless we don't have to delete at all.

Thanks,
Sujatha

On Wed, Jun 20, 2018 at 8:31 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/20/2018 3:46 PM, sujatha sankaran wrote:
> > Thanks,Shawn.   Very useful information.
> >
> > Please find below the log details:-
>
> Is your collection using the implicit router?  You didn't say.  If it
> is, then I think you may not be able to use deleteById.  This is indeed
> a bug, one that has been reported at least once already, but hasn't been
> fixed yet.   I do not know why it hasn't been fixed yet.  Maybe the fix
> is very difficult, or maybe the reason for the problem is not yet fully
> understood.
>
> The log you shared shows an error trying to do an update -- the delete
> that failed.  This kind of error is indeed likely to cause SolrCloud to
> attempt index recovery, all in accordance with SolrCloud design goals.
>
> Thanks,
> Shawn
>
>

Re: Delete By Query issue followed by Delete By Id Issues

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/20/2018 3:46 PM, sujatha sankaran wrote:
> Thanks,Shawn.   Very useful information.
>
> Please find below the log details:-

Is your collection using the implicit router?  You didn't say.  If it
is, then I think you may not be able to use deleteById.  This is indeed
a bug, one that has been reported at least once already, but hasn't been
fixed yet.   I do not know why it hasn't been fixed yet.  Maybe the fix
is very difficult, or maybe the reason for the problem is not yet fully
understood.

The log you shared shows an error trying to do an update -- the delete
that failed.  This kind of error is indeed likely to cause SolrCloud to
attempt index recovery, all in accordance with SolrCloud design goals.

Thanks,
Shawn

Re: Delete By Query issue followed by Delete By Id Issues

Posted by sujatha sankaran <su...@gmail.com>.

Thanks,Shawn.   Very useful information.

Please find below the log details:-



2018-06-20 17:19:06.661 ERROR
(updateExecutor-2-thread-8226-processing-crm_v2_01_shard3_replica1
x:crm_v2_01_shard3_replica2 r:core_node4 n:masked:8983_solr s:shard3
c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4
x:crm_v2_01_shard3_replica2] o.a.s.u.StreamingSolrClients error

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at crm_v2_01_shard3_replica1: Bad Request

request:
crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2

Remote error message: missing _version_ on update from leader

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

               at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

               at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 WARN  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Error sending update to
http://masked:8983/solr

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad
Request

request:
http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2

Remote error message: missing _version_ on update from leader

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

               at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

               at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 ERROR (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on
replica http://masked:8983/solr/crm_v2_01_shard3_replica3/

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://masked:8983/solr/crm_v2_01_shard3_replica3: Bad
Request

request:
http://masked:8983/solr/crm_v2_01_shard3_replica3/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2

Remote error message: missing _version_ on update from leader

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

               at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

               at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.662 INFO  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.ZkController Put replica
core=crm_v2_01_shard3_replica3 coreNodeName=core_node12 on masked:8983_solr
into leader-initiated recovery.

2018-06-20 17:19:06.662 WARN  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Error sending update to
http://masked:8983/solr

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at crm_v2_01_shard3_replica1: Bad Request

request:
crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2

Remote error message: missing _version_ on update from leader

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

               at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

               at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.663 ERROR (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on
replica crm_v2_01_shard3_replica1/

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at crm_v2_01_shard3_replica1: Bad Request

request:
crm_v2_01_shard3_replica1/update?update.chain=add-unknown-fields-to-the-schema&update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fmasked%3A8983%2Fsolr%2Fcrm_v2_01_shard3_replica2%2F&wt=javabin&version=2

Remote error message: missing _version_ on update from leader

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

               at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:184)

               at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)

               at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.dt_access$292(ExecutorUtil.java)

               at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

               at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

               at java.lang.Thread.run(Thread.java:748)

2018-06-20 17:19:06.663 INFO  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2] o.a.s.c.ZkController Put replica
core=crm_v2_01_shard3_replica1 coreNodeName=core_node13 on masked:8983_solr
into leader-initiated recovery.

2018-06-20 17:19:06.663 INFO  (qtp1002191352-169102) [c:crm_v2_01 s:shard3
r:core_node4 x:crm_v2_01_shard3_replica2]
o.a.s.u.p.LogUpdateProcessorFactory [crm_v2_01_shard3_replica2]  webapp=/solr
path=/update
params={update.distrib=TOLEADER&update.chain=add-unknown-fields-to-the-schema&distrib.from=http://masked:8983/solr/crm_v2_01_shard3_replica3/&wt=javabin&version=2}{delete=[note-20151333-8M821761N
(-1603827973916459008)]} 0 4

2018-06-20 17:19:06.668 INFO
(updateExecutor-2-thread-8226-processing-x:crm_v2_01_shard3_replica2
r:core_node4 crm_v2_01_shard3_replica3// n:masked:8983_solr s:shard3
c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4
x:crm_v2_01_shard3_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Put
replica core=crm_v2_01_shard3_replica3 coreNodeName=core_node12 on
masked:8983_solr
into leader-initiated recovery.

2018-06-20 17:19:06.668 WARN
(updateExecutor-2-thread-8226-processing-x:crm_v2_01_shard3_replica2
r:core_node4 crm_v2_01_shard3_replica3// n:masked:8983_solr s:shard3
c:crm_v2_01) [c:crm_v2_01 s:shard3 r:core_node4
x:crm_v2_01_shard3_replica2] o.a.s.c.LeaderInitiatedRecoveryThread Leader
is publishing core=crm_v2_01_shard3_replica3 coreNodeName =core_node12
state=down on behalf of un-reachable replica
http://masked:8983/solr/crm_v2_01_shard3_replica3/


Thanks,

Sujatha





On Wed, Jun 20, 2018 at 11:18 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/15/2018 3:14 PM, sujatha sankaran wrote:
>
>> We were initially having an issue with DBQ and heavy batch updates  which
>> used to result in many missing updates.
>>
>> After reading many mails in mailing list which mentions that DBQ and batch
>> update do not work well together, we switched to DBI. But  we are seeing
>> issue as mentioned in this jira issue:
>> https://issues.apache.org/jira/browse/SOLR-7384
>>
>
> If you're using the implicit router on your multi-shard collection,
> deleting by ID may not work for you.  There are a number of issues in Jira
> discussing various aspects of the problem.  On a collection using the
> compositeId router, I would expect those deletes to work well.
>
> Specifically we are seeing a pattern as :-
>>
>> ·        There are several  ERRORs and WARNs about “missing _*version*_”
>> type of thing.
>>
>> ·        ERROR message is typically single.
>>
>> ·        There are several WARNs after that and after couple of WARNs
>> there
>> is message that Leader initiated recovery has been kicked off .
>>
>
> Can you share these log entries?  The message on some of them is probably
> a dozen or more lines long, and may have multiple "Caused by" clauses that
> will also need to be included.  Seeing the whole log could be useful.
>
> *Setup info*:
>>
>> - Solr Cloud 6.6.2
>> --5 Node, 5 Shard, 3 replica setup
>> -~35million docs in the collection
>> -  Nodes have 90GB RAM 32 to JVM
>> -Soft commit interval 2 seconds, Hard commit (open searcher false) 15
>> seconds
>>
>
> Side notes:
>
> Solr would actually have more heap memory available if you set the heap to
> 31GB instead of 32GB.
>
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-
> java-jvm-memory-oddities/
>
> A 2 second soft commit interval is extremely aggressive.  If your soft
> commits are happening really quickly (far less that 1 second) then this
> might not be a problem, but with an index as large as yours, it is very
> likely that soft commits are taking much longer than 2 seconds.
>
> Thanks,
> Shawn
>
>

Re: Delete By Query issue followed by Delete By Id Issues

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/15/2018 3:14 PM, sujatha sankaran wrote:
> We were initially having an issue with DBQ and heavy batch updates  which
> used to result in many missing updates.
>
> After reading many mails in mailing list which mentions that DBQ and batch
> update do not work well together, we switched to DBI. But  we are seeing
> issue as mentioned in this jira issue:
> https://issues.apache.org/jira/browse/SOLR-7384

If you're using the implicit router on your multi-shard collection, 
deleting by ID may not work for you.  There are a number of issues in 
Jira discussing various aspects of the problem.  On a collection using 
the compositeId router, I would expect those deletes to work well.

> Specifically we are seeing a pattern as :-
>
> ·        There are several  ERRORs and WARNs about “missing _*version*_”
> type of thing.
>
> ·        ERROR message is typically single.
>
> ·        There are several WARNs after that and after couple of WARNs there
> is message that Leader initiated recovery has been kicked off .

Can you share these log entries?  The message on some of them is 
probably a dozen or more lines long, and may have multiple "Caused by" 
clauses that will also need to be included.  Seeing the whole log could 
be useful.

> *Setup info*:
>
> - Solr Cloud 6.6.2
> --5 Node, 5 Shard, 3 replica setup
> -~35million docs in the collection
> -  Nodes have 90GB RAM 32 to JVM
> -Soft commit interval 2 seconds, Hard commit (open searcher false) 15
> seconds

Side notes:

Solr would actually have more heap memory available if you set the heap 
to 31GB instead of 32GB.

https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

A 2 second soft commit interval is extremely aggressive.  If your soft 
commits are happening really quickly (far less that 1 second) then this 
might not be a problem, but with an index as large as yours, it is very 
likely that soft commits are taking much longer than 2 seconds.

Thanks,
Shawn