You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2017/05/15 11:22:04 UTC

[jira] [Comment Edited] (SOLR-10678) Clustering can be executed multiple times in distributed mode

    [ https://issues.apache.org/jira/browse/SOLR-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010338#comment-16010338 ] 

Dawid Weiss edited comment on SOLR-10678 at 5/15/17 11:21 AM:
--------------------------------------------------------------

I looked at it, here's a summary of my findings. 

1. Clustering isn't currently run twice in distributed mode, so this is a non-issue functionally. This is so because a distributed request goes through {{modifyRequest}} and {{ClusteringComponent}} removes itself (by disabling the {{clustering}} attribute) from subsequent shard requests:
{code}
  public void modifyRequest(ResponseBuilder rb, SearchComponent who, ShardRequest sreq) {
    SolrParams params = rb.req.getParams();
    if (!params.getBool(COMPONENT_NAME, false) || !params.getBool(ClusteringParams.USE_SEARCH_RESULTS, false)) {
      return;
    }
    sreq.params.remove(COMPONENT_NAME);
{code}

This is then checked in {{process}}:
{code}
  public void process(ResponseBuilder rb) throws IOException {
    SolrParams params = rb.req.getParams();
    if (!params.getBool(COMPONENT_NAME, false)) {
      return;
    }
{code}

2. What confused me a *lot* was why {{process}} is invoked during the distributed test (and why the test was executed so darn many times). Turned out it's because of the default {{ShardsRepeatRule}} and this gem inside inside {{BaseDistributedSearchTestCase}}:
{code}
  protected QueryResponse query(boolean setDistribParams, SolrParams p) throws Exception {
    final ModifiableSolrParams params = new ModifiableSolrParams(p);
    // TODO: look into why passing true causes fails
    params.set("distrib", "false");
    final QueryResponse controlRsp = controlClient.query(params);
    validateControlData(controlRsp);
    params.remove("distrib");
{code}

So the distributed test is running a forced-non-distributed request first, followed by the distributed request, hence the confusing logs.


was (Author: dweiss):
I looked at it, here's a summary of my findings. 

1) Clustering isn't currently run twice in distributed mode, so this is a non-issue functionally. This is so because a distributed request goes through {{modifyRequest}} and {{ClusteringComponent}} removes itself (by disabling the {{clustering}} attribute) from subsequent shard requests:
{code}
  public void modifyRequest(ResponseBuilder rb, SearchComponent who, ShardRequest sreq) {
    SolrParams params = rb.req.getParams();
    if (!params.getBool(COMPONENT_NAME, false) || !params.getBool(ClusteringParams.USE_SEARCH_RESULTS, false)) {
      return;
    }
    sreq.params.remove(COMPONENT_NAME);
{code}

This is then checked in {{process}}:
{code}
  public void process(ResponseBuilder rb) throws IOException {
    SolrParams params = rb.req.getParams();
    if (!params.getBool(COMPONENT_NAME, false)) {
      return;
    }
{code}

2. What confused me a *lot* was why {{process}} is invoked during the distributed test (and why the test was executed so darn many times). Turned out it's because of the default {{ShardsRepeatRule}} and this gem inside inside {{BaseDistributedSearchTestCase}}:
{code}
  protected QueryResponse query(boolean setDistribParams, SolrParams p) throws Exception {
    final ModifiableSolrParams params = new ModifiableSolrParams(p);
    // TODO: look into why passing true causes fails
    params.set("distrib", "false");
    final QueryResponse controlRsp = controlClient.query(params);
    validateControlData(controlRsp);
    params.remove("distrib");
{code}

So the distributed test is running a forced-non-distributed request first, followed by the distributed request, hence the confusing logs.

> Clustering can be executed multiple times in distributed mode
> -------------------------------------------------------------
>
>                 Key: SOLR-10678
>                 URL: https://issues.apache.org/jira/browse/SOLR-10678
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>
> As reported on SO: http://stackoverflow.com/questions/43877284/how-does-solr-clustering-component-work/43937064#43937064



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org