You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexandr Bocharov <bo...@gmail.com> on 2012/06/12 09:40:49 UTC

Solr PHP highload search

Hi, all.

I need advice for configuring Solr search to use at highload production.

I've wrote user's search engine (PHP class), that uses over 70 parameters
for searching users.
User's database is over 30 millions records.
Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
Previous search engine can handle 700,000 queries per day for searching
users - it is ~8 queries/sec (4 mysql servers with manual sharding via
Gearman)

Example of queries are:

[responseHeader] => SolrObject Object
        (
            [status] => 0
            [QTime] => 517
            [params] => SolrObject Object
                (
                    [bq] => Array
                        (
                            [0] => bool_field1:1^30
                            [1] => str_field1:str_value1^15
                            [2] => tint_field1:tint_field1^5
                            [3] => bool_field2:1^6
                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
                        )

                    [indent] => on
                    [start] => 0
                    [q.alt] => *:*
                    [wt] => xml
                    [fq] => Array
                        (
                            [0] => tint_field2:[tint_value2 TO tint_value22]
                            [1] => str_field1:str_value1
                            [2] => str_field2:str_value2
                            [3] => tint_field3:(tint_value3 OR tint_value32
OR tint_value33 OR tint_value34 OR tint_value5)
                            [4] => tint_field4:tint_value4
                            [5] => -bool_field1:[* TO *]
                        )

                    [version] => 2.2
                    [defType] => dismax
                    [rows] => 10
                )

        )


I test my PHP search API and found that concurrent random queries, for
example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
at 2 nodes.

1. How can I tweak my queries or parameters or Solr's config to decrease
QTime?
2. What if I put my index data to emulated RAM directory, can it increase
greatly performance?
3. Sorting by boost queries has a great influence on QTime, how can I
optimize boost queries?
4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes
per machine, will it increase performance?
5. What is "multi-core query", how can I configure it, and will it increase
performance?

Thank you!

Re: Solr PHP highload search

Posted by Jack Krupansky <ja...@basetechnology.com>.
Add "&debugQuery=true" to your query and look at the "timing" section that 
comes back with the response to see q breakdown of Qtime. It should offer 
some insight into which search component(s) are taking the most time. That 
might point you in the right direction for improvements.

Also, see how much JVM memory is available when you are running queries. 
Maybe memory is low and garbage collections are occurring too frequently.

-- Jack Krupansky

-----Original Message----- 
From: Alexandr Bocharov
Sent: Tuesday, June 12, 2012 3:40 AM
To: solr-user@lucene.apache.org
Subject: Solr PHP highload search

Hi, all.

I need advice for configuring Solr search to use at highload production.

I've wrote user's search engine (PHP class), that uses over 70 parameters
for searching users.
User's database is over 30 millions records.
Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
Previous search engine can handle 700,000 queries per day for searching
users - it is ~8 queries/sec (4 mysql servers with manual sharding via
Gearman)

Example of queries are:

[responseHeader] => SolrObject Object
        (
            [status] => 0
            [QTime] => 517
            [params] => SolrObject Object
                (
                    [bq] => Array
                        (
                            [0] => bool_field1:1^30
                            [1] => str_field1:str_value1^15
                            [2] => tint_field1:tint_field1^5
                            [3] => bool_field2:1^6
                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
                        )

                    [indent] => on
                    [start] => 0
                    [q.alt] => *:*
                    [wt] => xml
                    [fq] => Array
                        (
                            [0] => tint_field2:[tint_value2 TO tint_value22]
                            [1] => str_field1:str_value1
                            [2] => str_field2:str_value2
                            [3] => tint_field3:(tint_value3 OR tint_value32
OR tint_value33 OR tint_value34 OR tint_value5)
                            [4] => tint_field4:tint_value4
                            [5] => -bool_field1:[* TO *]
                        )

                    [version] => 2.2
                    [defType] => dismax
                    [rows] => 10
                )

        )


I test my PHP search API and found that concurrent random queries, for
example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
at 2 nodes.

1. How can I tweak my queries or parameters or Solr's config to decrease
QTime?
2. What if I put my index data to emulated RAM directory, can it increase
greatly performance?
3. Sorting by boost queries has a great influence on QTime, how can I
optimize boost queries?
4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes
per machine, will it increase performance?
5. What is "multi-core query", how can I configure it, and will it increase
performance?

Thank you! 


Re: Solr PHP highload search

Posted by Erick Erickson <er...@gmail.com>.
Consider just looking at it with jconsole (should be in your Java release) to
get a sense of the memory usage/collection. How much physical memory
do you have overall?

Because this is not what  I'd expect. Your CPU load is actually reasonably high,
so it doesn't look like you're swapping.

By and large, trying to use RAMDirectories isn't a good solution, between the OS
and Solr, they read the necessary parts of your index into memory and use that.

Best
Erick

On Wed, Jun 13, 2012 at 7:13 AM, Alexandr Bocharov
<bo...@gmail.com> wrote:
> Thank you for help :)
>
> I'm giving 2048M the JVM for each node.
> CPU load is jumping 70-90%.
> Memory usage is increasing to max during testing (probably cache is
> filling).
> I/O I didn't monitor.
>
> I'd like to see answers on my other questions.
>
> 2012/6/13 Erick Erickson <er...@gmail.com>
>
>> How much memory are you giving the JVM? Have you put a performance
>> monitor on the running process to see what resources have been
>> exhausted (i.e. are you I/O bound? CPU bound?)
>>
>> Best
>> Erick
>>
>> On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
>> <bo...@gmail.com> wrote:
>> > Hi, all.
>> >
>> > I need advice for configuring Solr search to use at highload production.
>> >
>> > I've wrote user's search engine (PHP class), that uses over 70 parameters
>> > for searching users.
>> > User's database is over 30 millions records.
>> > Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
>> > Previous search engine can handle 700,000 queries per day for searching
>> > users - it is ~8 queries/sec (4 mysql servers with manual sharding via
>> > Gearman)
>> >
>> > Example of queries are:
>> >
>> > [responseHeader] => SolrObject Object
>> >        (
>> >            [status] => 0
>> >            [QTime] => 517
>> >            [params] => SolrObject Object
>> >                (
>> >                    [bq] => Array
>> >                        (
>> >                            [0] => bool_field1:1^30
>> >                            [1] => str_field1:str_value1^15
>> >                            [2] => tint_field1:tint_field1^5
>> >                            [3] => bool_field2:1^6
>> >                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
>> >                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
>> >                        )
>> >
>> >                    [indent] => on
>> >                    [start] => 0
>> >                    [q.alt] => *:*
>> >                    [wt] => xml
>> >                    [fq] => Array
>> >                        (
>> >                            [0] => tint_field2:[tint_value2 TO
>> tint_value22]
>> >                            [1] => str_field1:str_value1
>> >                            [2] => str_field2:str_value2
>> >                            [3] => tint_field3:(tint_value3 OR
>> tint_value32
>> > OR tint_value33 OR tint_value34 OR tint_value5)
>> >                            [4] => tint_field4:tint_value4
>> >                            [5] => -bool_field1:[* TO *]
>> >                        )
>> >
>> >                    [version] => 2.2
>> >                    [defType] => dismax
>> >                    [rows] => 10
>> >                )
>> >
>> >        )
>> >
>> >
>> > I test my PHP search API and found that concurrent random queries, for
>> > example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
>> > at 2 nodes.
>> >
>> > 1. How can I tweak my queries or parameters or Solr's config to decrease
>> > QTime?
>> > 2. What if I put my index data to emulated RAM directory, can it increase
>> > greatly performance?
>> > 3. Sorting by boost queries has a great influence on QTime, how can I
>> > optimize boost queries?
>> > 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
>> nodes
>> > per machine, will it increase performance?
>> > 5. What is "multi-core query", how can I configure it, and will it
>> increase
>> > performance?
>> >
>> > Thank you!
>>

Re: Solr PHP highload search

Posted by Alexandr Bocharov <bo...@gmail.com>.
Thank you for help :)

I'm giving 2048M the JVM for each node.
CPU load is jumping 70-90%.
Memory usage is increasing to max during testing (probably cache is
filling).
I/O I didn't monitor.

I'd like to see answers on my other questions.

2012/6/13 Erick Erickson <er...@gmail.com>

> How much memory are you giving the JVM? Have you put a performance
> monitor on the running process to see what resources have been
> exhausted (i.e. are you I/O bound? CPU bound?)
>
> Best
> Erick
>
> On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
> <bo...@gmail.com> wrote:
> > Hi, all.
> >
> > I need advice for configuring Solr search to use at highload production.
> >
> > I've wrote user's search engine (PHP class), that uses over 70 parameters
> > for searching users.
> > User's database is over 30 millions records.
> > Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
> > Previous search engine can handle 700,000 queries per day for searching
> > users - it is ~8 queries/sec (4 mysql servers with manual sharding via
> > Gearman)
> >
> > Example of queries are:
> >
> > [responseHeader] => SolrObject Object
> >        (
> >            [status] => 0
> >            [QTime] => 517
> >            [params] => SolrObject Object
> >                (
> >                    [bq] => Array
> >                        (
> >                            [0] => bool_field1:1^30
> >                            [1] => str_field1:str_value1^15
> >                            [2] => tint_field1:tint_field1^5
> >                            [3] => bool_field2:1^6
> >                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
> >                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
> >                        )
> >
> >                    [indent] => on
> >                    [start] => 0
> >                    [q.alt] => *:*
> >                    [wt] => xml
> >                    [fq] => Array
> >                        (
> >                            [0] => tint_field2:[tint_value2 TO
> tint_value22]
> >                            [1] => str_field1:str_value1
> >                            [2] => str_field2:str_value2
> >                            [3] => tint_field3:(tint_value3 OR
> tint_value32
> > OR tint_value33 OR tint_value34 OR tint_value5)
> >                            [4] => tint_field4:tint_value4
> >                            [5] => -bool_field1:[* TO *]
> >                        )
> >
> >                    [version] => 2.2
> >                    [defType] => dismax
> >                    [rows] => 10
> >                )
> >
> >        )
> >
> >
> > I test my PHP search API and found that concurrent random queries, for
> > example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
> > at 2 nodes.
> >
> > 1. How can I tweak my queries or parameters or Solr's config to decrease
> > QTime?
> > 2. What if I put my index data to emulated RAM directory, can it increase
> > greatly performance?
> > 3. Sorting by boost queries has a great influence on QTime, how can I
> > optimize boost queries?
> > 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3
> nodes
> > per machine, will it increase performance?
> > 5. What is "multi-core query", how can I configure it, and will it
> increase
> > performance?
> >
> > Thank you!
>

Re: Solr PHP highload search

Posted by Erick Erickson <er...@gmail.com>.
How much memory are you giving the JVM? Have you put a performance
monitor on the running process to see what resources have been
exhausted (i.e. are you I/O bound? CPU bound?)

Best
Erick

On Tue, Jun 12, 2012 at 3:40 AM, Alexandr Bocharov
<bo...@gmail.com> wrote:
> Hi, all.
>
> I need advice for configuring Solr search to use at highload production.
>
> I've wrote user's search engine (PHP class), that uses over 70 parameters
> for searching users.
> User's database is over 30 millions records.
> Index total size is 6.4G when I use 1 node and 3.2G when 2 nodes.
> Previous search engine can handle 700,000 queries per day for searching
> users - it is ~8 queries/sec (4 mysql servers with manual sharding via
> Gearman)
>
> Example of queries are:
>
> [responseHeader] => SolrObject Object
>        (
>            [status] => 0
>            [QTime] => 517
>            [params] => SolrObject Object
>                (
>                    [bq] => Array
>                        (
>                            [0] => bool_field1:1^30
>                            [1] => str_field1:str_value1^15
>                            [2] => tint_field1:tint_field1^5
>                            [3] => bool_field2:1^6
>                            [4] => date_field1:[NOW-14DAYS TO NOW]^20
>                            [5] => date_field2:[NOW-14DAYS TO NOW]^5
>                        )
>
>                    [indent] => on
>                    [start] => 0
>                    [q.alt] => *:*
>                    [wt] => xml
>                    [fq] => Array
>                        (
>                            [0] => tint_field2:[tint_value2 TO tint_value22]
>                            [1] => str_field1:str_value1
>                            [2] => str_field2:str_value2
>                            [3] => tint_field3:(tint_value3 OR tint_value32
> OR tint_value33 OR tint_value34 OR tint_value5)
>                            [4] => tint_field4:tint_value4
>                            [5] => -bool_field1:[* TO *]
>                        )
>
>                    [version] => 2.2
>                    [defType] => dismax
>                    [rows] => 10
>                )
>
>        )
>
>
> I test my PHP search API and found that concurrent random queries, for
> example 10 queries at one time increases QTime from avg 500 ms to 3000 ms
> at 2 nodes.
>
> 1. How can I tweak my queries or parameters or Solr's config to decrease
> QTime?
> 2. What if I put my index data to emulated RAM directory, can it increase
> greatly performance?
> 3. Sorting by boost queries has a great influence on QTime, how can I
> optimize boost queries?
> 4. If I split my 2 nodes on 2 machines into 6 nodes on 2 machines, 3 nodes
> per machine, will it increase performance?
> 5. What is "multi-core query", how can I configure it, and will it increase
> performance?
>
> Thank you!