You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jilal Oussama <ji...@gmail.com> on 2013/12/26 11:38:56 UTC

Solr Query Slowliness

Hi all,

I have multiple python scripts querying solr with the sunburnt module.

Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
& 840 GB storage) and contained several cores for different usage.

When I manually executed a query through Solr Admin (a query containing
10~15 terms, with some of them having boosts over one field and limited to
one result without any sorting or faceting etc ....) it takes around 700
ms, and the Core contained 7 million documents.

When the scripts are executed things get slower, my query takes 7~10s.

Then what I did is to turn to SolrCloud expecting huge performance increase.

I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
to contain the core I was querying, I sharded it to 25 shards (each node
containing 5 shards without replication), each shards took 54 MB of storage.

Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
is very good !

Tested my scripts again (I have 30 scripts running at the same time), and
as a surprise, things run fast for 5 seconds then it turns realy slow again
(query time ).

I updated the solrconfig.xml to remove the query caches (I don't need them
since queries are very different and only 1 time queries) and changes the
index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Any ideas ?

PS: My index size will not stay with 7m documents, it will grow to +100m
and that may get things worse

Re: Solr Query Slowliness

Posted by Jilal Oussama <ji...@gmail.com>.
Thank you guys for your replies,

Sorry that I forgot to mention that I have allocated 10 GB of memory to the
Java Heap.


2013/12/26 Shawn Heisey <so...@elyograg.org>

> On 12/26/2013 3:38 AM, Jilal Oussama wrote:
> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> memory
> > & 840 GB storage) and contained several cores for different usage.
> >
> > When I manually executed a query through Solr Admin (a query containing
> > 10~15 terms, with some of them having boosts over one field and limited
> to
> > one result without any sorting or faceting etc ....) it takes around 700
> > ms, and the Core contained 7 million documents.
> >
> > When the scripts are executed things get slower, my query takes 7~10s.
> >
> > Then what I did is to turn to SolrCloud expecting huge performance
> increase.
> >
> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> collection
> > to contain the core I was querying, I sharded it to 25 shards (each node
> > containing 5 shards without replication), each shards took 54 MB of
> storage.
> >
> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> > is very good !
> >
> > Tested my scripts again (I have 30 scripts running at the same time), and
> > as a surprise, things run fast for 5 seconds then it turns realy slow
> again
> > (query time ).
> >
> > I updated the solrconfig.xml to remove the query caches (I don't need
> them
> > since queries are very different and only 1 time queries) and changes the
> > index memory to 1 GB, but only got a small increase (3~4s for each query
> ?!)
>
> Your SolrCloud setup has 35 times as much CPU power (just basing this on
> the ECU numbers) as your single-server setup, ten times as much memory,
> and a lot more IOPS because you moved to SSD.  A 10X increase in single
> query performance is not surprising.
>
> You have not indicated how much memory is assigned to the java heap on
> each server.  I think that there are three possible problems happening
> here, with a strong possibility that the third one is happening at the
> same time as one of the other two:
>
> 1) Full garbage collections are too frequent because the heap is too small.
> 2) Garbage collections take too long because the heap is very large and
> GC is not tuned.
> 3) Extremely high disk I/O because the OS disk cache is too small for
> the index size.
>
> Some information on these that might be helpful:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> The general solution for good Solr performance is to throw hardware,
> especially memory, at the problem.  It's worth pointing out that any
> level of hardware investment has an upper limit on the total query
> volume it can support.  Running 30 test scripts at the same time will be
> difficult for all but the most powerful and expensive hardware to deal
> with, especially if every query is different.  A five-server cloud where
> each server has 8 CPU cores and 15GB of memory is pretty small, all
> things considered.
>
> Thanks,
> Shawn
>
>

Re: Solr Query Slowliness

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/26/2013 3:38 AM, Jilal Oussama wrote:
> Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
> & 840 GB storage) and contained several cores for different usage.
> 
> When I manually executed a query through Solr Admin (a query containing
> 10~15 terms, with some of them having boosts over one field and limited to
> one result without any sorting or faceting etc ....) it takes around 700
> ms, and the Core contained 7 million documents.
> 
> When the scripts are executed things get slower, my query takes 7~10s.
> 
> Then what I did is to turn to SolrCloud expecting huge performance increase.
> 
> I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
> to contain the core I was querying, I sharded it to 25 shards (each node
> containing 5 shards without replication), each shards took 54 MB of storage.
> 
> Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> is very good !
> 
> Tested my scripts again (I have 30 scripts running at the same time), and
> as a surprise, things run fast for 5 seconds then it turns realy slow again
> (query time ).
> 
> I updated the solrconfig.xml to remove the query caches (I don't need them
> since queries are very different and only 1 time queries) and changes the
> index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

Your SolrCloud setup has 35 times as much CPU power (just basing this on
the ECU numbers) as your single-server setup, ten times as much memory,
and a lot more IOPS because you moved to SSD.  A 10X increase in single
query performance is not surprising.

You have not indicated how much memory is assigned to the java heap on
each server.  I think that there are three possible problems happening
here, with a strong possibility that the third one is happening at the
same time as one of the other two:

1) Full garbage collections are too frequent because the heap is too small.
2) Garbage collections take too long because the heap is very large and
GC is not tuned.
3) Extremely high disk I/O because the OS disk cache is too small for
the index size.

Some information on these that might be helpful:

http://wiki.apache.org/solr/SolrPerformanceProblems

The general solution for good Solr performance is to throw hardware,
especially memory, at the problem.  It's worth pointing out that any
level of hardware investment has an upper limit on the total query
volume it can support.  Running 30 test scripts at the same time will be
difficult for all but the most powerful and expensive hardware to deal
with, especially if every query is different.  A five-server cloud where
each server has 8 CPU cores and 15GB of memory is pretty small, all
things considered.

Thanks,
Shawn


Re: Solr Query Slowliness

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

It seems that the number of queries per second generated by your
scripts may be too much for your Solr cluster to handle with the
latency you want.

Try launching your scripts one by one and see what is the bottle neck
with your instance. I assume that for some number of scripts running
at the same time you will have good performance and it will start to
degrade after you start adding even more.

If you don't have high commit rate and you don't need NRT, disabling
the caches shouldn't be needed and they can help with query
performance.

Also there are tools our there that can help you diagnose what the
actual problem is, for example (http://sematext.com/spm/index.html). 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> This an example of a query:

> http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
> ^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true

> in return :

> {
>   "responseHeader":{
>     "status":0,
>     "QTime":191},
>  
> "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[
>       {
>         "Sections":"fashion",
>         "IdsCategories":"11101911",
>         "IdProduct":"ef6b8d7cf8340d0c8935727a07baebab",
>         "Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab",
>         "Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing
> Sweaters Cashmere",
>         "_version_":1455419757424541696}]
>   }}

> This query was executed when no script is running so the QTime is only
> 191 ms, but it may take up to 3s when they are)


> Of course it can be smaller or bigger and of course that affects the
> execution time (the execution times I spoke of are the internal ones
> returned by solr, not calculated by me).

> And yes the CPU is fully used.


> 2013/12/26 Rafał Kuć <r....@solr.pl>

>> Hello!
>>
>> Different queries can have different execution time, that's why I
>> asked about the details. When running the scripts, is Solr CPU fully
>> utilized? To tell more I would like to see what queries are run
>> against Solr from scripts.
>>
>> Do you have any information on network throughput between the server
>> you are running scripts on and the Solr cluster? You wrote that the
>> scripts are fine for 5 seconds and than they get slow. If your Solr
>> cluster is not fully utilized I would take a look at the queries and
>> what they return (ie. using faceting with facet.limit=-1) and seeing
>> if the network is able to process those.
>>
>> --
>> Regards,
>>  Rafał Kuć
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> > Thanks Rafal for your reply,
>>
>> > My scripts are running on other independent machines so they does not
>> > affect Solr, I did mention that the queries are not the same (that is
>> why I
>> > removed the query cache from solrconfig.xml), and I only get 1 result
>> from
>> > Solr (which is the top scored one so no sorting since it is by default
>> > ordred by score)
>>
>>
>>
>> > 2013/12/26 Rafał Kuć <r....@solr.pl>
>>
>> >> Hello!
>> >>
>> >> Could you tell us more about your scripts? What they do? If the
>> >> queries are the same? How many results you fetch with your scripts and
>> >> so on.
>> >>
>> >> --
>> >> Regards,
>> >>  Rafał Kuć
>> >> Performance Monitoring * Log Analytics * Search Analytics
>> >> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>> >> > Hi all,
>> >>
>> >> > I have multiple python scripts querying solr with the sunburnt module.
>> >>
>> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
>> >> memory
>> >> > & 840 GB storage) and contained several cores for different usage.
>> >>
>> >> > When I manually executed a query through Solr Admin (a query
>> containing
>> >> > 10~15 terms, with some of them having boosts over one field and
>> limited
>> >> to
>> >> > one result without any sorting or faceting etc ....) it takes around
>> 700
>> >> > ms, and the Core contained 7 million documents.
>> >>
>> >> > When the scripts are executed things get slower, my query takes 7~10s.
>> >>
>> >> > Then what I did is to turn to SolrCloud expecting huge performance
>> >> increase.
>> >>
>> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
>> vCPU
>> >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
>> >> collection
>> >> > to contain the core I was querying, I sharded it to 25 shards (each
>> node
>> >> > containing 5 shards without replication), each shards took 54 MB of
>> >> storage.
>> >>
>> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
>> wich
>> >> > is very good !
>> >>
>> >> > Tested my scripts again (I have 30 scripts running at the same time),
>> and
>> >> > as a surprise, things run fast for 5 seconds then it turns realy slow
>> >> again
>> >> > (query time ).
>> >>
>> >> > I updated the solrconfig.xml to remove the query caches (I don't need
>> >> them
>> >> > since queries are very different and only 1 time queries) and changes
>> the
>> >> > index memory to 1 GB, but only got a small increase (3~4s for each
>> query
>> >> ?!)
>> >>
>> >> > Any ideas ?
>> >>
>> >> > PS: My index size will not stay with 7m documents, it will grow to
>> +100m
>> >> > and that may get things worse
>> >>
>> >>
>>
>>


Re: Solr Query Slowliness

Posted by Jilal Oussama <ji...@gmail.com>.
This an example of a query:

http://myip:8080/solr/TestCatMatch_shard12_replica1/select?q=Royal+Cashmere+RC+106+CS+Silk+Cashmere+V+Neck+Moss+Green+Men
^10+s+Sweater+Cashmere^3+Men^3+Sweaters^3+Clothing^3&rows=1&wt=json&indent=true

in return :

{
  "responseHeader":{
    "status":0,
    "QTime":191},
  "response":{"numFound":4539784,"start":0,"maxScore":2.0123534,"docs":[
      {
        "Sections":"fashion",
        "IdsCategories":"11101911",
        "IdProduct":"ef6b8d7cf8340d0c8935727a07baebab",
        "Id":"11101911-ef6b8d7cf8340d0c8935727a07baebab",
        "Name":"Uniqlo Men Cashmere V Neck Sweater Men Clothing
Sweaters Cashmere",
        "_version_":1455419757424541696}]
  }}

This query was executed when no script is running so the QTime is only
191 ms, but it may take up to 3s when they are)


Of course it can be smaller or bigger and of course that affects the
execution time (the execution times I spoke of are the internal ones
returned by solr, not calculated by me).

And yes the CPU is fully used.


2013/12/26 Rafał Kuć <r....@solr.pl>

> Hello!
>
> Different queries can have different execution time, that's why I
> asked about the details. When running the scripts, is Solr CPU fully
> utilized? To tell more I would like to see what queries are run
> against Solr from scripts.
>
> Do you have any information on network throughput between the server
> you are running scripts on and the Solr cluster? You wrote that the
> scripts are fine for 5 seconds and than they get slow. If your Solr
> cluster is not fully utilized I would take a look at the queries and
> what they return (ie. using faceting with facet.limit=-1) and seeing
> if the network is able to process those.
>
> --
> Regards,
>  Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> > Thanks Rafal for your reply,
>
> > My scripts are running on other independent machines so they does not
> > affect Solr, I did mention that the queries are not the same (that is
> why I
> > removed the query cache from solrconfig.xml), and I only get 1 result
> from
> > Solr (which is the top scored one so no sorting since it is by default
> > ordred by score)
>
>
>
> > 2013/12/26 Rafał Kuć <r....@solr.pl>
>
> >> Hello!
> >>
> >> Could you tell us more about your scripts? What they do? If the
> >> queries are the same? How many results you fetch with your scripts and
> >> so on.
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >> Performance Monitoring * Log Analytics * Search Analytics
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >> > Hi all,
> >>
> >> > I have multiple python scripts querying solr with the sunburnt module.
> >>
> >> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> >> memory
> >> > & 840 GB storage) and contained several cores for different usage.
> >>
> >> > When I manually executed a query through Solr Admin (a query
> containing
> >> > 10~15 terms, with some of them having boosts over one field and
> limited
> >> to
> >> > one result without any sorting or faceting etc ....) it takes around
> 700
> >> > ms, and the Core contained 7 million documents.
> >>
> >> > When the scripts are executed things get slower, my query takes 7~10s.
> >>
> >> > Then what I did is to turn to SolrCloud expecting huge performance
> >> increase.
> >>
> >> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8
> vCPU
> >> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> >> collection
> >> > to contain the core I was querying, I sharded it to 25 shards (each
> node
> >> > containing 5 shards without replication), each shards took 54 MB of
> >> storage.
> >>
> >> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase
> wich
> >> > is very good !
> >>
> >> > Tested my scripts again (I have 30 scripts running at the same time),
> and
> >> > as a surprise, things run fast for 5 seconds then it turns realy slow
> >> again
> >> > (query time ).
> >>
> >> > I updated the solrconfig.xml to remove the query caches (I don't need
> >> them
> >> > since queries are very different and only 1 time queries) and changes
> the
> >> > index memory to 1 GB, but only got a small increase (3~4s for each
> query
> >> ?!)
> >>
> >> > Any ideas ?
> >>
> >> > PS: My index size will not stay with 7m documents, it will grow to
> +100m
> >> > and that may get things worse
> >>
> >>
>
>

Re: Solr Query Slowliness

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

Different queries can have different execution time, that's why I
asked about the details. When running the scripts, is Solr CPU fully
utilized? To tell more I would like to see what queries are run
against Solr from scripts.

Do you have any information on network throughput between the server
you are running scripts on and the Solr cluster? You wrote that the
scripts are fine for 5 seconds and than they get slow. If your Solr
cluster is not fully utilized I would take a look at the queries and
what they return (ie. using faceting with facet.limit=-1) and seeing
if the network is able to process those. 

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Thanks Rafal for your reply,

> My scripts are running on other independent machines so they does not
> affect Solr, I did mention that the queries are not the same (that is why I
> removed the query cache from solrconfig.xml), and I only get 1 result from
> Solr (which is the top scored one so no sorting since it is by default
> ordred by score)



> 2013/12/26 Rafał Kuć <r....@solr.pl>

>> Hello!
>>
>> Could you tell us more about your scripts? What they do? If the
>> queries are the same? How many results you fetch with your scripts and
>> so on.
>>
>> --
>> Regards,
>>  Rafał Kuć
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> > Hi all,
>>
>> > I have multiple python scripts querying solr with the sunburnt module.
>>
>> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
>> memory
>> > & 840 GB storage) and contained several cores for different usage.
>>
>> > When I manually executed a query through Solr Admin (a query containing
>> > 10~15 terms, with some of them having boosts over one field and limited
>> to
>> > one result without any sorting or faceting etc ....) it takes around 700
>> > ms, and the Core contained 7 million documents.
>>
>> > When the scripts are executed things get slower, my query takes 7~10s.
>>
>> > Then what I did is to turn to SolrCloud expecting huge performance
>> increase.
>>
>> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
>> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
>> collection
>> > to contain the core I was querying, I sharded it to 25 shards (each node
>> > containing 5 shards without replication), each shards took 54 MB of
>> storage.
>>
>> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
>> > is very good !
>>
>> > Tested my scripts again (I have 30 scripts running at the same time), and
>> > as a surprise, things run fast for 5 seconds then it turns realy slow
>> again
>> > (query time ).
>>
>> > I updated the solrconfig.xml to remove the query caches (I don't need
>> them
>> > since queries are very different and only 1 time queries) and changes the
>> > index memory to 1 GB, but only got a small increase (3~4s for each query
>> ?!)
>>
>> > Any ideas ?
>>
>> > PS: My index size will not stay with 7m documents, it will grow to +100m
>> > and that may get things worse
>>
>>


Re: Solr Query Slowliness

Posted by Jilal Oussama <ji...@gmail.com>.
Thanks Rafal for your reply,

My scripts are running on other independent machines so they does not
affect Solr, I did mention that the queries are not the same (that is why I
removed the query cache from solrconfig.xml), and I only get 1 result from
Solr (which is the top scored one so no sorting since it is by default
ordred by score)



2013/12/26 Rafał Kuć <r....@solr.pl>

> Hello!
>
> Could you tell us more about your scripts? What they do? If the
> queries are the same? How many results you fetch with your scripts and
> so on.
>
> --
> Regards,
>  Rafał Kuć
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> > Hi all,
>
> > I have multiple python scripts querying solr with the sunburnt module.
>
> > Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB
> memory
> > & 840 GB storage) and contained several cores for different usage.
>
> > When I manually executed a query through Solr Admin (a query containing
> > 10~15 terms, with some of them having boosts over one field and limited
> to
> > one result without any sorting or faceting etc ....) it takes around 700
> > ms, and the Core contained 7 million documents.
>
> > When the scripts are executed things get slower, my query takes 7~10s.
>
> > Then what I did is to turn to SolrCloud expecting huge performance
> increase.
>
> > I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> > with 28 ECU, 15 GB memory & 160 SSD storage), then I created one
> collection
> > to contain the core I was querying, I sharded it to 25 shards (each node
> > containing 5 shards without replication), each shards took 54 MB of
> storage.
>
> > Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> > is very good !
>
> > Tested my scripts again (I have 30 scripts running at the same time), and
> > as a surprise, things run fast for 5 seconds then it turns realy slow
> again
> > (query time ).
>
> > I updated the solrconfig.xml to remove the query caches (I don't need
> them
> > since queries are very different and only 1 time queries) and changes the
> > index memory to 1 GB, but only got a small increase (3~4s for each query
> ?!)
>
> > Any ideas ?
>
> > PS: My index size will not stay with 7m documents, it will grow to +100m
> > and that may get things worse
>
>

Re: Solr Query Slowliness

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

Could you tell us more about your scripts? What they do? If the
queries are the same? How many results you fetch with your scripts and
so on.

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi all,

> I have multiple python scripts querying solr with the sunburnt module.

> Solr was hosted on an Amazon ec2 m1.large (2 vCPU with 4 ECU, 7.5 GB memory
> & 840 GB storage) and contained several cores for different usage.

> When I manually executed a query through Solr Admin (a query containing
> 10~15 terms, with some of them having boosts over one field and limited to
> one result without any sorting or faceting etc ....) it takes around 700
> ms, and the Core contained 7 million documents.

> When the scripts are executed things get slower, my query takes 7~10s.

> Then what I did is to turn to SolrCloud expecting huge performance increase.

> I installed it on a cluster of 5 Amazon ec2 c3.2xlarge instances (8 vCPU
> with 28 ECU, 15 GB memory & 160 SSD storage), then I created one collection
> to contain the core I was querying, I sharded it to 25 shards (each node
> containing 5 shards without replication), each shards took 54 MB of storage.

> Tested my query on the new SolrCloud, it takes 70 ms ! huge increase wich
> is very good !

> Tested my scripts again (I have 30 scripts running at the same time), and
> as a surprise, things run fast for 5 seconds then it turns realy slow again
> (query time ).

> I updated the solrconfig.xml to remove the query caches (I don't need them
> since queries are very different and only 1 time queries) and changes the
> index memory to 1 GB, but only got a small increase (3~4s for each query ?!)

> Any ideas ?

> PS: My index size will not stay with 7m documents, it will grow to +100m
> and that may get things worse