You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Varun Thacker (JIRA)" <ji...@apache.org> on 2018/08/08 02:43:00 UTC
[jira] [Commented] (SOLR-12635) HashQParserPlugin should be run as a post filter cost is not explicitly defined

    [ https://issues.apache.org/jira/browse/SOLR-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572588#comment-16572588 ] 

Varun Thacker commented on SOLR-12635:
--------------------------------------

Here are some thoughts about using the HashQParser after speaking to Joel offline

We almost always want to use HashQParser to fetch a lot of data in parallel.

Now that sentence can be interpreted in two ways

Thought 1 - If we want to parallelize fetching data each 1/N stream won't be big. So a post filter approach makes sense.

Thought 2 - We are using parallel because the data is big and 1/N will also be big. The HashQParser is very cache friendly i.e once executed the following query will always be able to leverage the filterCache/queryResultCache and serve the query very fast. Pay the cost for the first time the query get's executed and then the query will be super fast. We could even avoid paying the cost for the first query by adding these 6 queries in the newSearcher event in your solrconfig.xml file 
{code:java}
<listener event="newSearcher" class="solr.QuerySenderListener">
  <arr name="queries">

    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=0}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=1}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=2}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=3}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=4}</str><str name="partitionKeys">myPartitionKey</str></lst>
    <lst><str name="q">*:*</str><str name="fq">{!hash workers=6 worker=5}</str><str name="partitionKeys">myPartitionKey</str></lst>
  </arr>
</listener>{code}
 

I'm going to ponder on this a little more but I'm tempted to go with the second school of thought . This would involve no changes to the code just adding this to the solrconfig.xml file . 

> HashQParserPlugin should be run as a post filter cost is not explicitly defined
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-12635
>                 URL: https://issues.apache.org/jira/browse/SOLR-12635
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>            Priority: Major
>         Attachments: SOLR-12635.patch
>
>
> I was doing some performance benchmarking for a user on slow streaming queries
> The weird thing was that same streaming expression was fast when we fired it again
> We were able to isolate the slowness to hash query parser
> Here is the first and second time we fired the query - to simplify things this is for one shard and for the same worker
> {code:java}
> path=/export params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey asc&wt=javabin&version=2.2} hits=0 status=0 QTime=6821
> path=/export params={q=*:*&distrib=false&indent=off&fl=fields&fq=user:1&fq={!hash workers=6 worker=3}&partitionKeys=partitionKey&sort=partitionKey asc&wt=javabin&version=2.2} hits=0 status=0 QTime=0{code}
> Even with hits=0 the first query took 6.8 seconds. The shard has 17m documents 
> The second query utilizes the queryResultCache and hence it's lightening fast the second time around.
> When we execute the same query and add a cost i.e {{&fq={!hash workers=6 worker=3}} cost=101} the query get's executed as a post filter and even uncashed is super fast.
> I created this Jira so that we can always set cost > 100 from the parallel stream.
> However I am happy to change the default behaviour for HashQParserPlugin and make it run as a post filter always unless explicitly specified. CollapsingQParserPlugin does this currently to make sure it's run as a post filter by default
> {code:java}
> public int getCost() {
>   return Math.max(super.getCost(), 100);
> }{code}
> Thoughts anyone? 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org