You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Volodymyr Zhabiuk <vz...@gmail.com> on 2012/04/27 03:50:16 UTC

Benchmark Solr vs Elastic Search vs Sensei

Hi Solr users

I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
 the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
&facet=true&facet.field=tags&facet.field=color
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads     min       median         mean            75%         qps
   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0
If there is no  indexing process on background
The result is as follows for 2,6 mln docs:
#Threads     min     median          mean             75%         qps
   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?

With many thanks,
Volodymyr


BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Volodymyr Zhabiuk <vz...@gmail.com>.
Hi Eric

Thanks for extensive answers. I will try to tune up my Solr
installation according to your advises and the wiki page you've
mentioned

Best regards,
Volodymyr

2012/4/27 Jeremy Taylor <jt...@datastax.com>:
> DataStax offers a Solr integration that isn't master/slave and is
> NearRealTimes.  Essentially, the software offers the great features of
> Solr without the major shortcomings.
>
> Jeremy
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, April 27, 2012 5:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
>
> Some observations:
> 1> I suspect some of your queries aren't doing what you expect, but
>     I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
>     as -tags:chick defaultField:magnet.
> 2> Typical Solr setups in production are usually master/slave
>     setups. Your indexing process (the commits) are causing
>     new searchers to be opened/warmed/etc quite regularly,
>     reducing your throughput. It's not surprising at all that
>     your QPS rate increases when not indexing.
> 3> The trunk Near Real Time with "soft commits" should change
>     the characteristics of the test with background indexing. You
>     might try that.
> 4> Examine your cache usage, see the Solr admin page. Caches
>     are quite important. Also consider autowarming characteristics.
> 5> There's a ton of stuff you can do to tune query rate. Unfortunately
>     what the specific thing that would help your situation is hard to
>     say. You might start with:
>    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> Best
> Erick
>
> On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com>
> wrote:
>> Hi Solr users
>>
>> I've implemented the project to compare the performance between Solr,
>> Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf
>>  the Solr version 3.5.0 was used. I've used the default configuration,
>> just enabled json updates and used the following schema
>>
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm
> l.
>> 2.5 mln documents were put into the index, after that I've launched
>> the indexing process to add anotherr 500k docs. I was issuing commits
>> after each 500 doc batch . At the same time I've launched the
>> concurrent client, that sent the following type of queries
>> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:
>> hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta
>> gs:soccer%20mom))%20
>> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye
>> llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black)
>> )%20
>> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
>> OR%20city:u.s.a.*
>> &facet=true&facet.field=tags&facet.field=color
>> The query contains the high level "OR" query, consisting of 2 terms, 2
>> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
>> Here is the performance result:
>> #Threads     min       median         mean            75%         qps
>>   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
>>   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
>>   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
>>   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0 If
>> there is no  indexing process on background The result is as follows
>> for 2,6 mln docs:
>> #Threads     min     median          mean             75%         qps
>>   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
>>   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
>>   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
>>   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
>>   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
>>   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
>>   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
>>   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5 I've
>> got three questions so far:
>> 1. In case of background indexing the latency is almost 2 times
>> higher, is there any way to overcome this?
>> 2. How can we tune the Solr to get better results ?
>> 3. What's in your opinion is the preferred type of queries that I can
>> use for the benchmark?
>>
>> With many thanks,
>> Volodymyr
>>
>>
>> BTW here is the spec of my machine
>> RedHat 6.1 64bit
>> Intel XEON e5620 @2.40 GHz, 8 cores
>> 63 GB RAM

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Jason Rutherglen <ja...@gmail.com>.
I think Datatax Enterprise is faster than Solr Cloud with transaction
logging turned on.  Cassandra has it's own fast(er) transaction
logging mechanism.  Of course it's best to use two HDs when testing,
eg, one for the data, the other for the transaction log.

On Fri, Apr 27, 2012 at 12:58 PM, Jeff Schmidt <ja...@535consulting.com> wrote:
> This is a pretty awesome combination, actually.  I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
>
> Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months.   I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
>
> My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
>
> Jeff
>
> On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
>
>> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>>
>>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>>> DataStax offers a Solr integration that isn't master/slave and is
>>>> NearRealTimes.
>>> its rebranded solandra?
>>
>> No, it is a rewrite.
>>
>> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>>
>> wunder
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>
>
>
> --
> Jeff Schmidt
> 535 Consulting
> jas@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
>
>
>
>
>
>
>
>
>

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Jake Luciani <ja...@gmail.com>.
Yes the replication, failover and distribution is managed by Cassandra it makes solr more dynamo like. For example scaling involves adding another node to the cassandra cluster. 

Finally since the field data is in Cassandra you can access it from Cassandra, Hadoop or Solr. 

Jake

 

On Apr 27, 2012, at 4:49 PM, Andy <an...@yahoo.com> wrote:

> So the Cassandra integration brings distributed index and replication to Solr? Is that different from what Solr Cloud does?
> 
> 
> ________________________________
> From: Jeff Schmidt <ja...@535consulting.com>
> To: solr-user@lucene.apache.org 
> Sent: Friday, April 27, 2012 3:58 PM
> Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
> 
> This is a pretty awesome combination, actually.  I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
> 
> Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months.   I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
> 
> My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
> 
> Jeff
> 
> On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
> 
>> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>> 
>>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>>> DataStax offers a Solr integration that isn't master/slave and is
>>>> NearRealTimes.
>>> its rebranded solandra?
>> 
>> No, it is a rewrite.
>> 
>> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>> 
>> wunder
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
> 
> 
> 
> --
> Jeff Schmidt
> 535 Consulting
> jas@535consulting.com
> http://www.535consulting.com
> (650) 423-1068

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Andy <an...@yahoo.com>.
So the Cassandra integration brings distributed index and replication to Solr? Is that different from what Solr Cloud does?


________________________________
 From: Jeff Schmidt <ja...@535consulting.com>
To: solr-user@lucene.apache.org 
Sent: Friday, April 27, 2012 3:58 PM
Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
 
This is a pretty awesome combination, actually.  I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.

Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months.   I am just trying to selfishly fuel your interest so you'll consider benchmarking it.

My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.

Jeff

On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:

> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
> 
>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>> DataStax offers a Solr integration that isn't master/slave and is
>>> NearRealTimes.
>> its rebranded solandra?
> 
> No, it is a rewrite.
> 
> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
> 
> wunder
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 



--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Jeff Schmidt <ja...@535consulting.com>.
This is a pretty awesome combination, actually.  I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.

Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months.   I am just trying to selfishly fuel your interest so you'll consider benchmarking it.

My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.

Jeff

On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:

> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
> 
>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>> DataStax offers a Solr integration that isn't master/slave and is
>>> NearRealTimes.
>> its rebranded solandra?
> 
> No, it is a rewrite.
> 
> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
> 
> wunder
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 



--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068










Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Walter Underwood <wu...@wunderwood.org>.
On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:

> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>> DataStax offers a Solr integration that isn't master/slave and is
>> NearRealTimes.
> its rebranded solandra?

No, it is a rewrite.

http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details

wunder
--
Walter Underwood
wunder@wunderwood.org




Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Radim Kolar <hs...@filez.com>.
Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
> DataStax offers a Solr integration that isn't master/slave and is
> NearRealTimes.
its rebranded solandra?

RE: Benchmark Solr vs Elastic Search vs Sensei

Posted by Jeremy Taylor <jt...@datastax.com>.
DataStax offers a Solr integration that isn't master/slave and is
NearRealTimes.  Essentially, the software offers the great features of
Solr without the major shortcomings.

Jeremy

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Friday, April 27, 2012 5:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Benchmark Solr vs Elastic Search vs Sensei

Some observations:
1> I suspect some of your queries aren't doing what you expect, but
     I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
     as -tags:chick defaultField:magnet.
2> Typical Solr setups in production are usually master/slave
     setups. Your indexing process (the commits) are causing
     new searchers to be opened/warmed/etc quite regularly,
     reducing your throughput. It's not surprising at all that
     your QPS rate increases when not indexing.
3> The trunk Near Real Time with "soft commits" should change
     the characteristics of the test with background indexing. You
     might try that.
4> Examine your cache usage, see the Solr admin page. Caches
     are quite important. Also consider autowarming characteristics.
5> There's a ton of stuff you can do to tune query rate. Unfortunately
     what the specific thing that would help your situation is hard to
     say. You might start with:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Best
Erick

On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com>
wrote:
> Hi Solr users
>
> I've implemented the project to compare the performance between Solr,
> Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf
>  the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
>
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm
l.
> 2.5 mln documents were put into the index, after that I've launched
> the indexing process to add anotherr 500k docs. I was issuing commits
> after each 500 doc batch . At the same time I've launched the
> concurrent client, that sent the following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:
> hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta
> gs:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye
> llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black)
> )%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads     min       median         mean            75%         qps
>   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
>   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
>   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
>   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0 If
> there is no  indexing process on background The result is as follows
> for 2,6 mln docs:
> #Threads     min     median          mean             75%         qps
>   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
>   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
>   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
>   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
>   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
>   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
>   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
>   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5 I've
> got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Erick Erickson <er...@gmail.com>.
Some observations:
1> I suspect some of your queries aren't doing what you expect, but
     I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
     as -tags:chick defaultField:magnet.
2> Typical Solr setups in production are usually master/slave
     setups. Your indexing process (the commits) are causing
     new searchers to be opened/warmed/etc quite regularly,
     reducing your throughput. It's not surprising at all that
     your QPS rate increases when not indexing.
3> The trunk Near Real Time with "soft commits" should change
     the characteristics of the test with background indexing. You
     might try that.
4> Examine your cache usage, see the Solr admin page. Caches
     are quite important. Also consider autowarming characteristics.
5> There's a ton of stuff you can do to tune query rate. Unfortunately
     what the specific thing that would help your situation is hard to
     say. You might start with:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Best
Erick

On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com> wrote:
> Hi Solr users
>
> I've implemented the project to compare the performance between
> Solr, Elastic Search and SenseiDB
> https://github.com/vzhabiuk/search-perf
>  the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
> 2.5 mln documents were put into the index, after
> that I've launched the indexing process to add anotherr 500k docs. I
> was issuing commits after each 500 doc batch . At the
> same time I've launched the concurrent client, that sent the
> following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads     min       median         mean            75%         qps
>   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
>   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
>   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
>   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0
> If there is no  indexing process on background
> The result is as follows for 2,6 mln docs:
> #Threads     min     median          mean             75%         qps
>   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
>   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
>   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
>   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
>   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
>   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
>   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
>   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5
> I've got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Volodymyr Zhabiuk <vz...@gmail.com>.
Hi Andy

I don't want to publish results, since still there are some mistakes
in the benchmark. Also this would be controversial, because there are
too many parameters to tune and to take into consideration.
Nevertheless you can go to the Sensei google group to see the
preliminary result for Sensei

At first I was using the benchmark to do the stress testing for
Sensei. We needed  to identify possible memory leaks and bottlenecks
in the new release. After that I've extended the tool to test  Solr
and Elastic search

With many thanks,
Volodymyr

2012/4/27 Andy <an...@yahoo.com>:
> What is the performance of Elasticsearch and SenseiDB in your benchmark?
>
>
> ________________________________
>  From: Volodymyr Zhabiuk <vz...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 26, 2012 9:50 PM
> Subject: Benchmark Solr vs Elastic Search vs Sensei
>
> Hi Solr users
>
> I've implemented the project to compare the performance between
> Solr, Elastic Search and SenseiDB
> https://github.com/vzhabiuk/search-perf
> the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
> 2.5 mln documents were put into the index, after
> that I've launched the indexing process to add anotherr 500k docs. I
> was issuing commits after each 500 doc batch . At the
> same time I've launched the concurrent client, that sent the
> following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads     min       median         mean            75%         qps
>    1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
>    2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
>    3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
>    4         125.13ms  332.90ms    332.18ms     396.14ms     12.0
> If there is no  indexing process on background
> The result is as follows for 2,6 mln docs:
> #Threads     min     median          mean             75%         qps
>    1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
>    2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
>    3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
>    4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
>    5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
>    6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
>    7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
>    8         117.32ms  216.38ms    224.74ms     258.74ms     35.5
> I've got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM

Re: Benchmark Solr vs Elastic Search vs Sensei

Posted by Andy <an...@yahoo.com>.
What is the performance of Elasticsearch and SenseiDB in your benchmark?


________________________________
 From: Volodymyr Zhabiuk <vz...@gmail.com>
To: solr-user@lucene.apache.org 
Sent: Thursday, April 26, 2012 9:50 PM
Subject: Benchmark Solr vs Elastic Search vs Sensei
 
Hi Solr users

I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
&facet=true&facet.field=tags&facet.field=color
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads     min       median         mean            75%         qps
   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0
If there is no  indexing process on background
The result is as follows for 2,6 mln docs:
#Threads     min     median          mean             75%         qps
   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?

With many thanks,
Volodymyr


BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM