You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Varun Thacker (JIRA)" <ji...@apache.org> on 2018/08/07 22:55:00 UTC

[jira] [Updated] (SOLR-12635) HashQParserPlugin should be run as a post filter when executed from a ParallelStream

     [ https://issues.apache.org/jira/browse/SOLR-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Varun Thacker updated SOLR-12635:
---------------------------------
    Attachment: SOLR-12635.patch

> HashQParserPlugin should be run as a post filter when executed from a ParallelStream
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-12635
>                 URL: https://issues.apache.org/jira/browse/SOLR-12635
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-12635.patch
>
>
> I was doing some performance benchmarking for a user on slow streaming queries
> The weird thing was that same streaming expression was fast when we fired it again
> We were able to isolate the slowness to hash query parser
> Here is the first and second time we fired the query - to simplify things this is for one shard and for the same worker
> {code:java}
> path=/export params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey asc&wt=javabin&version=2.2} hits=0 status=0 QTime=6821
> path=/export params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey asc&wt=javabin&version=2.2} hits=0 status=0 QTime=0{code}
> Even with hits=0 the first query took 6.8 seconds. The shard has 17m documents 
> The second query utilizes the queryResultCache and hence it's lightening fast the second time around.
> When we execute the same query and add a cost i.e {{&fq={!hash workers=6 worker=3}} cost=101} the query get's executed as a post filter and even uncashed is super fast.
> I created this Jira so that we can always set cost > 100 from the parallel stream.
> However I am happy to change the default behaviour for HashQParserPlugin and make it run as a post filter always unless explicitly specified. CollapsingQParserPlugin does this currently to make sure it's run as a post filter by default
> {code:java}
> public int getCost() {
>   return Math.max(super.getCost(), 100);
> }{code}
> Thoughts anyone? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org