You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Dinesh Kumar Naik (Jira)" <ji...@apache.org> on 2021/11/11 07:49:00 UTC

[jira] [Commented] (SOLR-14298) LBSolrClient.checkAZombieServer should be less stupid

    [ https://issues.apache.org/jira/browse/SOLR-14298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442131#comment-17442131 ] 

Dinesh Kumar Naik commented on SOLR-14298:
------------------------------------------

[~hossman] I completely agree with you. Doing a match-all query with billions of documents per shard can be a very costly operation even though the row is set to 0 with distrib false. 

Here are some of the calls and their respective QTime from one of my setup : 

 
{code:java}
2021-11-09 21:15:53.853 WARN  (qtp435914790-25301965) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226420962 status=0 QTime=15516
2021-11-10 00:45:16.816 WARN  (qtp435914790-25341761) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226984671 status=0 QTime=15169
2021-11-10 00:45:30.772 WARN  (qtp435914790-25339675) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226985078 status=0 QTime=15494
2021-11-10 00:45:34.244 WARN  (qtp435914790-25334052) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226985527 status=0 QTime=15462
2021-11-10 00:46:19.480 WARN  (qtp435914790-25340369) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1226987732 status=0 QTime=14553
2021-11-10 18:03:49.885 WARN  (qtp435914790-25486769) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228021741 status=0 QTime=16130
2021-11-10 18:04:14.511 WARN  (qtp435914790-25523411) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228021262 status=0 QTime=16626
2021-11-10 18:04:23.904 WARN  (qtp435914790-25454090) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020775 status=0 QTime=16556
2021-11-10 18:04:43.355 WARN  (qtp435914790-25505322) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020627 status=0 QTime=17029
2021-11-10 18:04:49.181 WARN  (qtp435914790-25509646) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020577 status=0 QTime=17242
2021-11-10 18:04:53.577 WARN  (qtp435914790-25484919) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228020577 status=0 QTime=19169
2021-11-10 18:05:06.366 WARN  (qtp435914790-25523409) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228019443 status=0 QTime=17352
2021-11-10 18:05:07.594 WARN  (qtp435914790-25527485) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228018445 status=0 QTime=17309
2021-11-10 18:05:07.908 WARN  (qtp435914790-25496685) x:Item_collection_shard15_replica_n115 o.a.s.c.S.SlowRequest slow: [Item_collection_shard15_replica_n115]  webapp=/solr path=/select params={q=*:*&distrib=false&sort=_docid_+asc&rows=0&wt=javabin&version=2} hits=1228018445 status=0 QTime=17445 {code}
 

 

The QTime is over 14 , 15 seconds for all such checks which can be avoided as you suggested. 

Here is my observation for the 3 approaches suggested by you: 

1. The *segmentTerminateEarly* option might not help us as the default merge policy is *TieredMergePolicyFactory.* 

As per [https://solr.apache.org/guide/8_6/common-query-parameters.html#segmentterminateearly-parameter]

If *segmentTerminateEarly*   is set to true, and if [the mergePolicyFactory|https://solr.apache.org/guide/8_6/indexconfig-in-solrconfig.html#mergepolicyfactory] for this collection is a [SortingMergePolicyFactory|https://lucene.apache.org/solr/8_6_0/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html] which uses a sort option compatible with [the sort parameter|https://solr.apache.org/guide/8_6/common-query-parameters.html#sort-parameter] specified for this query, then Solr will be able to skip documents on a per-segment basis that are definitively not candidates for the current page of results.

2. Use of timeAllowed: Using the smaller value of timeAllowed helps reduce the QTime Drastically and it would return partial results.

!image-2021-11-11-13-11-30-930.png|width=524,height=288!

3. Negation match all query ie. q=-*:* 

This seems to be the fastest option and it would literally be a 1 character patch. 

!image-2021-11-11-13-13-20-791.png|width=524,height=267!

Kindly let me know your thoughts and then we can plan for a patch accordingly!

> LBSolrClient.checkAZombieServer should be less stupid
> -----------------------------------------------------
>
>                 Key: SOLR-14298
>                 URL: https://issues.apache.org/jira/browse/SOLR-14298
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: image-2021-11-11-13-11-30-930.png, image-2021-11-11-13-13-20-791.png
>
>
> LBSolrClient.checkAZombieServer() currently does /select query for {{\*:\*}} with distrib=false, rows=0, sort=\_docid\_ ... but this can still chew up a lot of time if the shard is big, and it's not self evident wtf is going on in the server logs.
> At a minimum, these requests should include some sort of tracing param to identify the point of he query (ie: {{_zombieservercheck=true}}) and should probably be changed to hit something like the /ping handler, or the node status handler, or if it's important to folks that it do a "search" that actaully uses the index searcher, then it should use  options like timeAllowed / segmentTerminateEarly, and/or {{q=-\*:\*}} instead .. or maybe a cusorMark ... something to make it not have the overhead of counting all the hits.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org