You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "mosh (JIRA)" <ji...@apache.org> on 2019/08/13 11:37:00 UTC

[jira] [Commented] (SOLR-13125) Optimize Queries when sorting by router.field

    [ https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906103#comment-16906103 ] 

mosh commented on SOLR-13125:
-----------------------------

{quote}What you would need to do is somehow influence the futures that solr is waiting on to return early and empty once your request has been filled up from the most recent collections. {quote}

After further investigation it seems like currently HttpShardHandler#take:288 is designed to wait(block) for all shard requests.
{code:java}private ShardResponse take(boolean bailOnError) {
    
    while (pending.size() > 0) {
      try {
        Future<ShardResponse> future = completionService.take();
        pending.remove(future);
        ShardResponse rsp = future.get();
        if (bailOnError && rsp.getException() != null) return rsp; // if exception, return immediately
        // add response to the response list... we do this after the take() and
        // not after the completion of "call" so we know when the last response
        // for a request was received.  Otherwise we might return the same
        // request more than once.
        rsp.getShardRequest().responses.add(rsp);
        if (rsp.getShardRequest().responses.size() == rsp.getShardRequest().actualShards.length) {
          return rsp;
        }
      } catch (InterruptedException e) {
        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
      } catch (ExecutionException e) {
        // should be impossible... the problem with catching the exception
        // at this level is we don't know what ShardRequest it applied to
        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Impossible Exception",e);
      }
    }
    return null;
  }{code}
This seems counter intuitive to our goal of "short-circuiting" shard requests, once sufficient documents have been returned.
HttpShardHandler also seems out of scope for SearchComponents, at least using the current design, in which SearchComponents are coupled to a SearchHandler.

[~gus_heck], WDYT?
Do you have any idea of how this should be designed without significant code changes?
Bear in mind, this discussion only concerns optimizing TRA queries, and could be made abstract in due time (resembling the way RoutedAliases were implemented).

> Optimize Queries when sorting by router.field
> ---------------------------------------------
>
>                 Key: SOLR-13125
>                 URL: https://issues.apache.org/jira/browse/SOLR-13125
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: mosh
>            Priority: Minor
>         Attachments: SOLR-13125-no-commit.patch, SOLR-13125.patch, SOLR-13125.patch, SOLR-13125.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, with much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger nodes(SSD, more RAM, etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA uses to route the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart enough to wait for the more recent collections, and in case the limit was reached cancel other queries(or just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than router.field, but sorting by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the alias.
> But to optimize this particular type of query, Solr could wait for the most recent collection in the TRA, see whether the result set matches or exceeds the limit. If so, the query could be returned to the user without waiting for the rest of the shards. If not, the issuing node will block until the second query returns, and so forth, until the limit of the request is reached.
> This might also be useful for deep paging, querying each collection and only skipping to the next once there are no more results in the specified collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org