You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sasarun <sa...@gmail.com> on 2017/09/26 14:43:32 UTC

Solr performance issue on querying --> Solr 6.5.1

Hi All, 
I have been using Solr for some time now but mostly in standalone mode. Now
my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
has the following configuration. In the prod environment the performance on
querying seems to really slow. Can anyone help me with few pointers on
howimprove on the same. 

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
        <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
        <bool
name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
        <int
name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
        <bool
name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}</bool>
        <int
name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
        <bool
name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
        <bool
name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool>
        <bool
name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
        <int
name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
        <int
name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
</directoryFactory>
    <lockType>hdfs</lockType>
It has 6 collections of following size 
Collection 1 -->6.41 MB
Collection 2 -->634.51 KB 
Collection 3 -->4.59 MB 
Collection 4 -->1,020.56 MB 
Collection 5 --> 607.26 MB
Collection 6 -->102.4 kb
Each Collection has 5 shards each. Allocated heap size for young generation
is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
size 
utlisation is really low compared to these values. 
But querying to Collection 4 and collection 5 is giving really slow response
even thoughwe are not using any complex queries.Output of debug quries run
with debug=timing
are given below for reference. Can anyone help suggest a way improve the
performance.

Response to query
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">3962</int>
<lst name="params">
<str name="q">
("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")
</str>
<str name="defType">edismax</str>
<str name="debug">true</str>
<str name="indent">on</str>
<arr name="qf">
<str>host</str>
<str>title</str>
<str>url</str>
<str>customContent</str>
<str>contentSpecificSearch</str>
</arr>
<arr name="fl">
<str>id</str>
<str>contentTagsCount</str>
</arr>
<str name="start">0</str>
<str name="bq.op">OR</str>
<str name="q.op">OR</str>
<str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
<str name="rows">600</str>
<str name="bq">
("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
"Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
"Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
"hybrid electric"^15.0 "electric powerplant"^15.0)
</str>
</lst>
</lst>
<result name="response" numFound="205458" start="0" maxScore="1836.806">
<lst name="timing">
<double name="time">15374.0</double>
<lst name="prepare">
<double name="time">2.0</double>
<lst name="query">
<double name="time">2.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">15363.0</double>
<lst name="query">
<double name="time">1313.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">14048.0</double>
</lst>
</lst>
</lst>


Thanks,
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by Toke Eskildsen <to...@kb.dk>.
On Tue, 2017-09-26 at 07:43 -0700, sasarun wrote:
> Allocated heap size for young generation is about 8 gb and old 
> generation is about 24 gb. And gc analysis showed peak
> size utlisation is really low compared to these values.

That does not come as a surprise. Your collections would normally be
considered small, if not tiny, looking only at their size measured in
bytes. Again, if you expect them to grow significantly (more than 10x),
your allocation might make sense. If you do not expect such a growth in
the near future, you will be better off with a much smaller heap: The
peak heap utilization that you have logged (or twice that to err on the
cautious side) seems a good starting point.

And whatever you do, don't set Xmx to 32GB. Use <31GB or significantly
more than 32GB:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-mem
ory-oddities/


Are you indexing while you search? If so, you need to set auto-warm or
state a few explicit warmup-queries. If not, your measuring will not be
representative as it will be on first-searches, which are always slower
than warmed-searches.


- Toke Eskildsen, Royal Danish Library


Re: Solr performance issue on querying --> Solr 6.5.1

Posted by Emir Arnautović <em...@sematext.com>.
Hi Arun,
It is hard to measure something without affecting it, but we could use debug results and combine with QTime without debug: If we ignore merging results, it seems that majority of time is spent for retrieving docs (~500ms). You should consider reducing number of rows if you want better response time (you can ask for rows=0 to see max possible time). Also, as Erick suggested, reducing number of shards (1 if not plan much more doc) will trim some overhead of merging results.

Thanks,
Emir

I noticed that you removed bq - is time with bq acceptable as well?
> On 27 Sep 2017, at 12:34, sasarun <sa...@gmail.com> wrote:
> 
> Hi Emir, 
> 
> Please find the response without bq parameter and debugQuery set to true. 
> Also it was noted that Qtime comes down drastically without the debug
> parameter to about 700-800. 
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">3446</int>
> <lst name="params">
> <str name="q">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="defType">edismax</str>
> <str name="indent">on</str>
> <arr name="qf">
> <str>host</str>
> <str>title</str>
> <str>url</str>
> <str>customContent</str>
> <str>contentSpecificSearch</str>
> </arr>
> <arr name="fl">
> <str>id</str>
> <str>contentOntologyTagsCount</str>
> </arr>
> <str name="start">0</str>
> <str name="q.op">OR</str>
> <str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
> <str name="rows">600</str>
> <str name="debugQuery">true</str>
> </lst>
> </lst>
> <result name="response" numFound="205458" start="0"
> maxScore="56.74194">...</result>
> <lst name="debug">
> <lst name="track">
> <str name="rid">
> solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20
> </str>
> <lst name="EXECUTE_QUERY">
> <lst name="">
> <str name="QTime">35</str>
> <str name="ElapsedTime">159</str>
> <str name="RequestPurpose">GET_TOP_IDS</str>
> <str name="NumFound">41294</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">29</str>
> <str name="ElapsedTime">165</str>
> <str name="RequestPurpose">GET_TOP_IDS</str>
> <str name="NumFound">40980</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">31</str>
> <str name="ElapsedTime">200</str>
> <str name="RequestPurpose">GET_TOP_IDS</str>
> <str name="NumFound">41006</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">43</str>
> <str name="ElapsedTime">208</str>
> <str name="RequestPurpose">GET_TOP_IDS</str>
> <str name="NumFound">41040</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">181</str>
> <str name="ElapsedTime">466</str>
> <str name="RequestPurpose">GET_TOP_IDS</str>
> <str name="NumFound">41138</str>
> <str name="Response">...</str>
> </lst>
> </lst>
> <lst name="GET_FIELDS">
> <lst name="">
> <str name="QTime">1518</str>
> <str name="ElapsedTime">1523</str>
> <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
> <str name="NumFound">110</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">1562</str>
> <str name="ElapsedTime">1573</str>
> <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
> <str name="NumFound">115</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">1793</str>
> <str name="ElapsedTime">1800</str>
> <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
> <str name="NumFound">120</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">2153</str>
> <str name="ElapsedTime">2161</str>
> <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
> <str name="NumFound">125</str>
> <str name="Response">...</str>
> </lst>
> <lst name="">
> <str name="QTime">2957</str>
> <str name="ElapsedTime">2970</str>
> <str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
> <str name="NumFound">130</str>
> <str name="Response">...</str>
> </lst>
> </lst>
> </lst>
> <lst name="timing">
> <double name="time">10302.0</double>
> <lst name="prepare">
> <double name="time">2.0</double>
> <lst name="query">
> <double name="time">2.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">10288.0</double>
> <lst name="query">
> <double name="time">661.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">9627.0</double>
> </lst>
> </lst>
> </lst>
> <str name="rawquerystring">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="querystring">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="parsedquery">
> (+(DisjunctionMaxQuery((host:hybrid electric powerplant |
> contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
> electric powerplant" | title:hybrid electric powerplant | url:hybrid
> electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
> | contentSpecificSearch:"hybrid electric powerplants" |
> customContent:"hybrid electric powerplants" | title:hybrid electric
> powerplants | url:hybrid electric powerplants))
> DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
> customContent:electric | title:Electric | url:Electric))
> DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
> customContent:electrical | title:Electrical | url:Electrical))
> DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
> customContent:electricity | title:Electricity | url:Electricity))
> DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
> customContent:engine | title:Engine | url:Engine))
> DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
> economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
> economy)) DisjunctionMaxQuery((host:fuel efficiency |
> contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
> title:fuel efficiency | url:fuel efficiency))
> DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
> contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
> electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
> Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
> contentSpecificSearch:"power systems" | customContent:"power systems" |
> title:Power Systems | url:Power Systems))
> DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
> customContent:powerplant | title:Powerplant | url:Powerplant))
> DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
> customContent:propulsion | title:Propulsion | url:Propulsion))
> DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
> customContent:hybrid | title:hybrid | url:hybrid))
> DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
> electric" | customContent:"hybrid electric" | title:hybrid electric |
> url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant |
> contentSpecificSearch:"electric powerplant" | customContent:"electric
> powerplant" | title:electric powerplant | url:electric
> powerplant))))/no_coord
> </str>
> <str name="parsedquery_toString">
> +((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric
> powerplant" | customContent:"hybrid electric powerplant" | title:hybrid
> electric powerplant | url:hybrid electric powerplant) (host:hybrid electric
> powerplants | contentSpecificSearch:"hybrid electric powerplants" |
> customContent:"hybrid electric powerplants" | title:hybrid electric
> powerplants | url:hybrid electric powerplants) (host:Electric |
> contentSpecificSearch:electric | customContent:electric | title:Electric |
> url:Electric) (host:Electrical | contentSpecificSearch:electrical |
> customContent:electrical | title:Electrical | url:Electrical)
> (host:Electricity | contentSpecificSearch:electricity |
> customContent:electricity | title:Electricity | url:Electricity)
> (host:Engine | contentSpecificSearch:engine | customContent:engine |
> title:Engine | url:Engine) (host:fuel economy | contentSpecificSearch:"fuel
> economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
> economy) (host:fuel efficiency | contentSpecificSearch:"fuel efficiency" |
> customContent:"fuel efficiency" | title:fuel efficiency | url:fuel
> efficiency) (host:Hybrid Electric Propulsion | contentSpecificSearch:"hybrid
> electric propulsion" | customContent:"hybrid electric propulsion" |
> title:Hybrid Electric Propulsion | url:Hybrid Electric Propulsion)
> (host:Power Systems | contentSpecificSearch:"power systems" |
> customContent:"power systems" | title:Power Systems | url:Power Systems)
> (host:Powerplant | contentSpecificSearch:powerplant |
> customContent:powerplant | title:Powerplant | url:Powerplant)
> (host:Propulsion | contentSpecificSearch:propulsion |
> customContent:propulsion | title:Propulsion | url:Propulsion) (host:hybrid |
> contentSpecificSearch:hybrid | customContent:hybrid | title:hybrid |
> url:hybrid) (host:hybrid electric | contentSpecificSearch:"hybrid electric"
> | customContent:"hybrid electric" | title:hybrid electric | url:hybrid
> electric) (host:electric powerplant | contentSpecificSearch:"electric
> powerplant" | customContent:"electric powerplant" | title:electric
> powerplant | url:electric powerplant))
> </str>
> <str name="QParser">ExtendedDismaxQParser</str>
> <null name="altquerystring"/>
> <null name="boost_queries"/>
> <arr name="parsed_boost_queries"/>
> <null name="boostfuncs"/>
> <lst name="explain">...</lst>
> </lst>
> </response>
> 
> Thanks, 
> Arun
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

Posted by sasarun <sa...@gmail.com>.
Hi Emir, 

Please find the response without bq parameter and debugQuery set to true. 
Also it was noted that Qtime comes down drastically without the debug
parameter to about 700-800. 
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">3446</int>
<lst name="params">
<str name="q">
("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")
</str>
<str name="defType">edismax</str>
<str name="indent">on</str>
<arr name="qf">
<str>host</str>
<str>title</str>
<str>url</str>
<str>customContent</str>
<str>contentSpecificSearch</str>
</arr>
<arr name="fl">
<str>id</str>
<str>contentOntologyTagsCount</str>
</arr>
<str name="start">0</str>
<str name="q.op">OR</str>
<str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
<str name="rows">600</str>
<str name="debugQuery">true</str>
</lst>
</lst>
<result name="response" numFound="205458" start="0"
maxScore="56.74194">...</result>
<lst name="debug">
<lst name="track">
<str name="rid">
solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20
</str>
<lst name="EXECUTE_QUERY">
<lst name="">
<str name="QTime">35</str>
<str name="ElapsedTime">159</str>
<str name="RequestPurpose">GET_TOP_IDS</str>
<str name="NumFound">41294</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">29</str>
<str name="ElapsedTime">165</str>
<str name="RequestPurpose">GET_TOP_IDS</str>
<str name="NumFound">40980</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">31</str>
<str name="ElapsedTime">200</str>
<str name="RequestPurpose">GET_TOP_IDS</str>
<str name="NumFound">41006</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">43</str>
<str name="ElapsedTime">208</str>
<str name="RequestPurpose">GET_TOP_IDS</str>
<str name="NumFound">41040</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">181</str>
<str name="ElapsedTime">466</str>
<str name="RequestPurpose">GET_TOP_IDS</str>
<str name="NumFound">41138</str>
<str name="Response">...</str>
</lst>
</lst>
<lst name="GET_FIELDS">
<lst name="">
<str name="QTime">1518</str>
<str name="ElapsedTime">1523</str>
<str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
<str name="NumFound">110</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">1562</str>
<str name="ElapsedTime">1573</str>
<str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
<str name="NumFound">115</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">1793</str>
<str name="ElapsedTime">1800</str>
<str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
<str name="NumFound">120</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">2153</str>
<str name="ElapsedTime">2161</str>
<str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
<str name="NumFound">125</str>
<str name="Response">...</str>
</lst>
<lst name="">
<str name="QTime">2957</str>
<str name="ElapsedTime">2970</str>
<str name="RequestPurpose">GET_FIELDS,GET_DEBUG</str>
<str name="NumFound">130</str>
<str name="Response">...</str>
</lst>
</lst>
</lst>
<lst name="timing">
<double name="time">10302.0</double>
<lst name="prepare">
<double name="time">2.0</double>
<lst name="query">
<double name="time">2.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">10288.0</double>
<lst name="query">
<double name="time">661.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">9627.0</double>
</lst>
</lst>
</lst>
<str name="rawquerystring">
("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")
</str>
<str name="querystring">
("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")
</str>
<str name="parsedquery">
(+(DisjunctionMaxQuery((host:hybrid electric powerplant |
contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
electric powerplant" | title:hybrid electric powerplant | url:hybrid
electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
| contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants))
DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
customContent:electric | title:Electric | url:Electric))
DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical))
DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity))
DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
customContent:engine | title:Engine | url:Engine))
DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
economy)) DisjunctionMaxQuery((host:fuel efficiency |
contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
title:fuel efficiency | url:fuel efficiency))
DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
contentSpecificSearch:"power systems" | customContent:"power systems" |
title:Power Systems | url:Power Systems))
DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
customContent:powerplant | title:Powerplant | url:Powerplant))
DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
customContent:propulsion | title:Propulsion | url:Propulsion))
DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
customContent:hybrid | title:hybrid | url:hybrid))
DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
electric" | customContent:"hybrid electric" | title:hybrid electric |
url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant |
contentSpecificSearch:"electric powerplant" | customContent:"electric
powerplant" | title:electric powerplant | url:electric
powerplant))))/no_coord
</str>
<str name="parsedquery_toString">
+((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric
powerplant" | customContent:"hybrid electric powerplant" | title:hybrid
electric powerplant | url:hybrid electric powerplant) (host:hybrid electric
powerplants | contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants) (host:Electric |
contentSpecificSearch:electric | customContent:electric | title:Electric |
url:Electric) (host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical)
(host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity)
(host:Engine | contentSpecificSearch:engine | customContent:engine |
title:Engine | url:Engine) (host:fuel economy | contentSpecificSearch:"fuel
economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
economy) (host:fuel efficiency | contentSpecificSearch:"fuel efficiency" |
customContent:"fuel efficiency" | title:fuel efficiency | url:fuel
efficiency) (host:Hybrid Electric Propulsion | contentSpecificSearch:"hybrid
electric propulsion" | customContent:"hybrid electric propulsion" |
title:Hybrid Electric Propulsion | url:Hybrid Electric Propulsion)
(host:Power Systems | contentSpecificSearch:"power systems" |
customContent:"power systems" | title:Power Systems | url:Power Systems)
(host:Powerplant | contentSpecificSearch:powerplant |
customContent:powerplant | title:Powerplant | url:Powerplant)
(host:Propulsion | contentSpecificSearch:propulsion |
customContent:propulsion | title:Propulsion | url:Propulsion) (host:hybrid |
contentSpecificSearch:hybrid | customContent:hybrid | title:hybrid |
url:hybrid) (host:hybrid electric | contentSpecificSearch:"hybrid electric"
| customContent:"hybrid electric" | title:hybrid electric | url:hybrid
electric) (host:electric powerplant | contentSpecificSearch:"electric
powerplant" | customContent:"electric powerplant" | title:electric
powerplant | url:electric powerplant))
</str>
<str name="QParser">ExtendedDismaxQParser</str>
<null name="altquerystring"/>
<null name="boost_queries"/>
<arr name="parsed_boost_queries"/>
<null name="boostfuncs"/>
<lst name="explain">...</lst>
</lst>
</response>

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by Emir Arnautović <em...@sematext.com>.
Hi Arun,
This is not the most simple query either - a dozen of phrase queries on several fields + the same query as bq. Can you provide debugQuery info.
I did not look much into debug times and what includes what, but one thing that is strange to me is that QTime is 4s while query in debug is 1.3s. Can you try running without bq? Can you include boost factors in the main query?

Thanks,
Emir

> On 26 Sep 2017, at 16:43, sasarun <sa...@gmail.com> wrote:
> 
> Hi All, 
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same. 
> 
> <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
>        <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
>        <bool
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
>        <int
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
>        <bool
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}</bool>
>        <int
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
>        <bool
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
>        <bool
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool>
>        <bool
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
>        <int
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
>        <int
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
> </directoryFactory>
>    <lockType>hdfs</lockType>
> It has 6 collections of following size 
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB 
> Collection 3 -->4.59 MB 
> Collection 4 -->1,020.56 MB 
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size 
> utlisation is really low compared to these values. 
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
> 
> Response to query
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">3962</int>
> <lst name="params">
> <str name="q">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="defType">edismax</str>
> <str name="debug">true</str>
> <str name="indent">on</str>
> <arr name="qf">
> <str>host</str>
> <str>title</str>
> <str>url</str>
> <str>customContent</str>
> <str>contentSpecificSearch</str>
> </arr>
> <arr name="fl">
> <str>id</str>
> <str>contentTagsCount</str>
> </arr>
> <str name="start">0</str>
> <str name="bq.op">OR</str>
> <str name="q.op">OR</str>
> <str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
> <str name="rows">600</str>
> <str name="bq">
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> </str>
> </lst>
> </lst>
> <result name="response" numFound="205458" start="0" maxScore="1836.806">
> <lst name="timing">
> <double name="time">15374.0</double>
> <lst name="prepare">
> <double name="time">2.0</double>
> <lst name="query">
> <double name="time">2.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">15363.0</double>
> <lst name="query">
> <double name="time">1313.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">14048.0</double>
> </lst>
> </lst>
> </lst>
> 
> 
> Thanks,
> Arun
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

Posted by sasarun <sa...@gmail.com>.
Hi Erick, 

As suggested, I did try nonHDFS solr cloud instance and it response looks to
be really better. From the configuration side to, I am mostly using default
configurations and with block.cache.direct.memory.allocation as false.  On
analysis of hdfs cache, evictions seems to be on higher side. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by sasarun <sa...@gmail.com>.
Hi Erick, 

Qtime comes down with rows set as 1. Also it was noted that qtime comes down
when debug parameter is not added with the query. It comes to about 900.

Thanks, 
Arun 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by Erick Erickson <er...@gmail.com>.
Well, 15 second responses are not what I'd expect either. But two
things (just looked again)

1> note that the time to assemble the debug information is a large
majority of your total time (14 of 15.3 seconds).
2> you're specifying 600 rows which is quite a lot as each one
requires that a 16K block of data be read from disk and decompressed
to assemble the "fl" list.

so one quick test would be to set rows=1 or something. All that said,
the QTime value returned does _not_ include <1> or <2> above and even
4 seconds seems excessive.

Best,
Erick

On Tue, Sep 26, 2017 at 10:54 AM, sasarun <sa...@gmail.com> wrote:
> Hi Erick,
>
> Thank you for the quick response. Query time was relatively faster once it
> is read from memory. But personally I always felt response time could be far
> better. As suggested, We will try and set up in a non HDFS environment and
> update on the results.
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by sasarun <sa...@gmail.com>.
Hi Erick, 

Thank you for the quick response. Query time was relatively faster once it
is read from memory. But personally I always felt response time could be far
better. As suggested, We will try and set up in a non HDFS environment and
update on the results. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr performance issue on querying --> Solr 6.5.1

Posted by Erick Erickson <er...@gmail.com>.
Does the query time _stay_ low? Once the data is read from HDFS it
should pretty much stay in memory. So my question is whether, once
Solr warms up you see this kind of query response time.

Have you tried this on a non HDFS system? That would be useful to help
figure out where to look.

And given the sizes of your collections, unless you expect them to get
much larger, there's no reason to shard any of them. Sharding should
only really be used when the collections are too big for a single
shard as distributed searches inevitably have increased overhead. I
expect _at least_ 20M documents/shard, and have seen 200M docs/shard.
YMMV of course.

Best,
Erick

On Tue, Sep 26, 2017 at 7:43 AM, sasarun <sa...@gmail.com> wrote:
> Hi All,
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same.
>
> <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
>         <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
>         <bool
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
>         <int
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
>         <bool
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}</bool>
>         <int
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
>         <bool
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
>         <bool
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool>
>         <bool
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
>         <int
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
>         <int
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
> </directoryFactory>
>     <lockType>hdfs</lockType>
> It has 6 collections of following size
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB
> Collection 3 -->4.59 MB
> Collection 4 -->1,020.56 MB
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size
> utlisation is really low compared to these values.
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
>
> Response to query
> <response>
> <lst name="responseHeader">
> <bool name="zkConnected">true</bool>
> <int name="status">0</int>
> <int name="QTime">3962</int>
> <lst name="params">
> <str name="q">
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> </str>
> <str name="defType">edismax</str>
> <str name="debug">true</str>
> <str name="indent">on</str>
> <arr name="qf">
> <str>host</str>
> <str>title</str>
> <str>url</str>
> <str>customContent</str>
> <str>contentSpecificSearch</str>
> </arr>
> <arr name="fl">
> <str>id</str>
> <str>contentTagsCount</str>
> </arr>
> <str name="start">0</str>
> <str name="bq.op">OR</str>
> <str name="q.op">OR</str>
> <str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
> <str name="rows">600</str>
> <str name="bq">
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> </str>
> </lst>
> </lst>
> <result name="response" numFound="205458" start="0" maxScore="1836.806">
> <lst name="timing">
> <double name="time">15374.0</double>
> <lst name="prepare">
> <double name="time">2.0</double>
> <lst name="query">
> <double name="time">2.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">15363.0</double>
> <lst name="query">
> <double name="time">1313.0</double>
> </lst>
> <lst name="facet">
> <double name="time">0.0</double>
> </lst>
> <lst name="facet_module">
> <double name="time">0.0</double>
> </lst>
> <lst name="mlt">
> <double name="time">0.0</double>
> </lst>
> <lst name="highlight">
> <double name="time">0.0</double>
> </lst>
> <lst name="stats">
> <double name="time">0.0</double>
> </lst>
> <lst name="expand">
> <double name="time">0.0</double>
> </lst>
> <lst name="terms">
> <double name="time">0.0</double>
> </lst>
> <lst name="debug">
> <double name="time">14048.0</double>
> </lst>
> </lst>
> </lst>
>
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html