You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Vitaliy Zhovtyuk (JIRA)" <ji...@apache.org> on 2014/02/14 20:38:22 UTC

[jira] [Updated] (SOLR-2908) To push the terms.limit parameter from the master core to all the shard cores.

     [ https://issues.apache.org/jira/browse/SOLR-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vitaliy Zhovtyuk updated SOLR-2908:
-----------------------------------

    Attachment: SOLR-2908.patch

If you limit terms number you'd need to pass at least sorting to shard in order to get most relevant terms (if needed).
Added shards.terms.params.override=true parameter, if terms parameters (terms.limit, terms.sort, terms.maxcount, terms.mincount) should be passed to shards.
Using this parameter with terms.sort=index (no sorting) is ok, but using shards.terms.params.override with terms.sort=count can lead to inconsistent results with single core.
See org.apache.solr.handler.component.DistributedTermsComponentParametersTest. 

For example, we use 
{code}shards.terms.params.override=true&terms.limit=5&terms.sort=count{code}
and data
{code}    index(id, 18, "b_t", "snake spider shark snail slug seal");
    index(id, 19, "b_t", "snake spider shark snail slug");
    index(id, 20, "b_t", "snake spider shark snail");
    index(id, 21, "b_t", "snake spider shark");
    index(id, 22, "b_t", "snake spider");
    index(id, 23, "b_t", "snake");
    index(id, 24, "b_t", "ant zebra");
    index(id, 25, "b_t", "zebra");
{code}

WIth single core results will be like 
{code}snake=6 spider=5 shark=4 snail=3 slug=2{code}

For 2shards results will be like
shard 1:  {code} snake=3 spider=3 shark=2 snail=2 ant=1 {code}
shard 2: {code} snake=3 spider=2 shark=2 seal=1 slug=1 {code}

Combined result: {code} snake=6 spider=5 shark=4 snail=2 ant=1 {code}

I suggest this parameter override will be useful with sorting and custom routing, in case that same terms located on the same shard, 
sorted and limited there correctly.

> To push the terms.limit parameter from the master core to all the shard cores.
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-2908
>                 URL: https://issues.apache.org/jira/browse/SOLR-2908
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4.1
>         Environment: Linux server. 64 bit processor and 16GB Ram.
>            Reporter: sivaganesh
>            Priority: Critical
>              Labels: patch
>             Fix For: 4.7
>
>         Attachments: SOLR-2908.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When we pass the terms.limit parameter to the master (which has many shard cores), it's not getting pushed down to the individual cores. Instead the default value of -1 is assigned to Terms.limit parameter is assigned in the underlying shard cores. The issue being the time taken by the Master core to return the required limit of terms is higher when we are having more number of underlying shard cores. This affects the performances of the auto suggest feature. 
> Can thought we can have a parameter to explicitly override the -1 being set to Terms.limit in shards core.
> We saw the source code(TermsComponent.java) and concluded that the same. Please help us in pushing the terms.limit parameter to shard cores. 
> PFB code snippet.
> private ShardRequest createShardQuery(SolrParams params) {
>     ShardRequest sreq = new ShardRequest();
>     sreq.purpose = ShardRequest.PURPOSE_GET_TERMS;
>     // base shard request on original parameters
>     sreq.params = new ModifiableSolrParams(params);
>     // remove any limits for shards, we want them to return all possible
>     // responses
>     // we want this so we can calculate the correct counts
>     // dont sort by count to avoid that unnecessary overhead on the shards
>     sreq.params.remove(TermsParams.TERMS_MAXCOUNT);
>     sreq.params.remove(TermsParams.TERMS_MINCOUNT);
>     sreq.params.set(TermsParams.TERMS_LIMIT, -1);
>     sreq.params.set(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_INDEX);
>     return sreq;
>   }
> Solr Version:
> Solr Specification Version: 1.4.0.2010.01.13.08.09.44 
>  Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 
>  Lucene Specification Version: 2.9.1-dev 
>  Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org