You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Volodymyr Zhabiuk <vz...@gmail.com> on 2012/04/27 03:50:16 UTC
Benchmark Solr vs Elastic Search vs Sensei
Hi Solr users
I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
&facet=true&facet.field=tags&facet.field=color
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads min median mean 75% qps
1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
4 125.13ms 332.90ms 332.18ms 396.14ms 12.0
If there is no indexing process on background
The result is as follows for 2,6 mln docs:
#Threads min median mean 75% qps
1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
8 117.32ms 216.38ms 224.74ms 258.74ms 35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?
With many thanks,
Volodymyr
BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Volodymyr Zhabiuk <vz...@gmail.com>.
Hi Eric
Thanks for extensive answers. I will try to tune up my Solr
installation according to your advises and the wiki page you've
mentioned
Best regards,
Volodymyr
2012/4/27 Jeremy Taylor <jt...@datastax.com>:
> DataStax offers a Solr integration that isn't master/slave and is
> NearRealTimes. Essentially, the software offers the great features of
> Solr without the major shortcomings.
>
> Jeremy
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, April 27, 2012 5:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
>
> Some observations:
> 1> I suspect some of your queries aren't doing what you expect, but
> I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
> as -tags:chick defaultField:magnet.
> 2> Typical Solr setups in production are usually master/slave
> setups. Your indexing process (the commits) are causing
> new searchers to be opened/warmed/etc quite regularly,
> reducing your throughput. It's not surprising at all that
> your QPS rate increases when not indexing.
> 3> The trunk Near Real Time with "soft commits" should change
> the characteristics of the test with background indexing. You
> might try that.
> 4> Examine your cache usage, see the Solr admin page. Caches
> are quite important. Also consider autowarming characteristics.
> 5> There's a ton of stuff you can do to tune query rate. Unfortunately
> what the specific thing that would help your situation is hard to
> say. You might start with:
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> Best
> Erick
>
> On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com>
> wrote:
>> Hi Solr users
>>
>> I've implemented the project to compare the performance between Solr,
>> Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf
>> the Solr version 3.5.0 was used. I've used the default configuration,
>> just enabled json updates and used the following schema
>>
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm
> l.
>> 2.5 mln documents were put into the index, after that I've launched
>> the indexing process to add anotherr 500k docs. I was issuing commits
>> after each 500 doc batch . At the same time I've launched the
>> concurrent client, that sent the following type of queries
>> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:
>> hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta
>> gs:soccer%20mom))%20
>> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye
>> llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black)
>> )%20
>> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
>> OR%20city:u.s.a.*
>> &facet=true&facet.field=tags&facet.field=color
>> The query contains the high level "OR" query, consisting of 2 terms, 2
>> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
>> Here is the performance result:
>> #Threads min median mean 75% qps
>> 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
>> 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
>> 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
>> 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0 If
>> there is no indexing process on background The result is as follows
>> for 2,6 mln docs:
>> #Threads min median mean 75% qps
>> 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
>> 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
>> 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
>> 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
>> 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
>> 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
>> 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
>> 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5 I've
>> got three questions so far:
>> 1. In case of background indexing the latency is almost 2 times
>> higher, is there any way to overcome this?
>> 2. How can we tune the Solr to get better results ?
>> 3. What's in your opinion is the preferred type of queries that I can
>> use for the benchmark?
>>
>> With many thanks,
>> Volodymyr
>>
>>
>> BTW here is the spec of my machine
>> RedHat 6.1 64bit
>> Intel XEON e5620 @2.40 GHz, 8 cores
>> 63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Jason Rutherglen <ja...@gmail.com>.
I think Datatax Enterprise is faster than Solr Cloud with transaction
logging turned on. Cassandra has it's own fast(er) transaction
logging mechanism. Of course it's best to use two HDs when testing,
eg, one for the data, the other for the transaction log.
On Fri, Apr 27, 2012 at 12:58 PM, Jeff Schmidt <ja...@535consulting.com> wrote:
> This is a pretty awesome combination, actually. I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
>
> Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months. I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
>
> My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
>
> Jeff
>
> On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
>
>> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>>
>>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>>> DataStax offers a Solr integration that isn't master/slave and is
>>>> NearRealTimes.
>>> its rebranded solandra?
>>
>> No, it is a rewrite.
>>
>> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>>
>> wunder
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>
>
>
> --
> Jeff Schmidt
> 535 Consulting
> jas@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
>
>
>
>
>
>
>
>
>
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Jake Luciani <ja...@gmail.com>.
Yes the replication, failover and distribution is managed by Cassandra it makes solr more dynamo like. For example scaling involves adding another node to the cassandra cluster.
Finally since the field data is in Cassandra you can access it from Cassandra, Hadoop or Solr.
Jake
On Apr 27, 2012, at 4:49 PM, Andy <an...@yahoo.com> wrote:
> So the Cassandra integration brings distributed index and replication to Solr? Is that different from what Solr Cloud does?
>
>
> ________________________________
> From: Jeff Schmidt <ja...@535consulting.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, April 27, 2012 3:58 PM
> Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
>
> This is a pretty awesome combination, actually. I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
>
> Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months. I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
>
> My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
>
> Jeff
>
> On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
>
>> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>>
>>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>>> DataStax offers a Solr integration that isn't master/slave and is
>>>> NearRealTimes.
>>> its rebranded solandra?
>>
>> No, it is a rewrite.
>>
>> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>>
>> wunder
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>
>
>
> --
> Jeff Schmidt
> 535 Consulting
> jas@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Andy <an...@yahoo.com>.
So the Cassandra integration brings distributed index and replication to Solr? Is that different from what Solr Cloud does?
________________________________
From: Jeff Schmidt <ja...@535consulting.com>
To: solr-user@lucene.apache.org
Sent: Friday, April 27, 2012 3:58 PM
Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
This is a pretty awesome combination, actually. I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months. I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
Jeff
On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>
>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>> DataStax offers a Solr integration that isn't master/slave and is
>>> NearRealTimes.
>> its rebranded solandra?
>
> No, it is a rewrite.
>
> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>
> wunder
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Jeff Schmidt <ja...@535consulting.com>.
This is a pretty awesome combination, actually. I'm getting started using it myself, and I'd be very interested in what kind of benchmark results you get vs. Solr and your other candidates. DataStax Enterprise 2.0 was released in March and is based on Solr 4.0 and Cassandra 1.0.7 or 1.0.8, I'm looking for the Cassandra 1.1 based release.
Note: I am not affiliated with DataStax in anyway, other than being a satisfied customer for the past few months. I am just trying to selfishly fuel your interest so you'll consider benchmarking it.
My project is already using Cassandra, and we had to manage Solr separately. Having the Solr indexes, and core configuration (solrconfig.xml, schema.xml, synonyms.txt etc) in Cassandra, being distributed and replicated among the various nodes, and eventually for us, multiple data centers is fantastic.
Jeff
On Apr 27, 2012, at 1:46 PM, Walter Underwood wrote:
> On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
>
>> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>>> DataStax offers a Solr integration that isn't master/slave and is
>>> NearRealTimes.
>> its rebranded solandra?
>
> No, it is a rewrite.
>
> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
>
> wunder
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Walter Underwood <wu...@wunderwood.org>.
On Apr 27, 2012, at 12:39 PM, Radim Kolar wrote:
> Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
>> DataStax offers a Solr integration that isn't master/slave and is
>> NearRealTimes.
> its rebranded solandra?
No, it is a rewrite.
http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
wunder
--
Walter Underwood
wunder@wunderwood.org
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Radim Kolar <hs...@filez.com>.
Dne 27.4.2012 19:59, Jeremy Taylor napsal(a):
> DataStax offers a Solr integration that isn't master/slave and is
> NearRealTimes.
its rebranded solandra?
RE: Benchmark Solr vs Elastic Search vs Sensei
Posted by Jeremy Taylor <jt...@datastax.com>.
DataStax offers a Solr integration that isn't master/slave and is
NearRealTimes. Essentially, the software offers the great features of
Solr without the major shortcomings.
Jeremy
-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Friday, April 27, 2012 5:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Benchmark Solr vs Elastic Search vs Sensei
Some observations:
1> I suspect some of your queries aren't doing what you expect, but
I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
as -tags:chick defaultField:magnet.
2> Typical Solr setups in production are usually master/slave
setups. Your indexing process (the commits) are causing
new searchers to be opened/warmed/etc quite regularly,
reducing your throughput. It's not surprising at all that
your QPS rate increases when not indexing.
3> The trunk Near Real Time with "soft commits" should change
the characteristics of the test with background indexing. You
might try that.
4> Examine your cache usage, see the Solr admin page. Caches
are quite important. Also consider autowarming characteristics.
5> There's a ton of stuff you can do to tune query rate. Unfortunately
what the specific thing that would help your situation is hard to
say. You might start with:
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Best
Erick
On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com>
wrote:
> Hi Solr users
>
> I've implemented the project to compare the performance between Solr,
> Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf
> the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
>
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm
l.
> 2.5 mln documents were put into the index, after that I've launched
> the indexing process to add anotherr 500k docs. I was issuing commits
> after each 500 doc batch . At the same time I've launched the
> concurrent client, that sent the following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:
> hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta
> gs:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye
> llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black)
> )%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads min median mean 75% qps
> 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
> 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
> 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
> 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0 If
> there is no indexing process on background The result is as follows
> for 2,6 mln docs:
> #Threads min median mean 75% qps
> 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
> 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
> 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
> 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
> 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
> 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
> 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
> 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5 I've
> got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Erick Erickson <er...@gmail.com>.
Some observations:
1> I suspect some of your queries aren't doing what you expect, but
I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
as -tags:chick defaultField:magnet.
2> Typical Solr setups in production are usually master/slave
setups. Your indexing process (the commits) are causing
new searchers to be opened/warmed/etc quite regularly,
reducing your throughput. It's not surprising at all that
your QPS rate increases when not indexing.
3> The trunk Near Real Time with "soft commits" should change
the characteristics of the test with background indexing. You
might try that.
4> Examine your cache usage, see the Solr admin page. Caches
are quite important. Also consider autowarming characteristics.
5> There's a ton of stuff you can do to tune query rate. Unfortunately
what the specific thing that would help your situation is hard to
say. You might start with:
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
Best
Erick
On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk <vz...@gmail.com> wrote:
> Hi Solr users
>
> I've implemented the project to compare the performance between
> Solr, Elastic Search and SenseiDB
> https://github.com/vzhabiuk/search-perf
> the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
> 2.5 mln documents were put into the index, after
> that I've launched the indexing process to add anotherr 500k docs. I
> was issuing commits after each 500 doc batch . At the
> same time I've launched the concurrent client, that sent the
> following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads min median mean 75% qps
> 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
> 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
> 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
> 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0
> If there is no indexing process on background
> The result is as follows for 2,6 mln docs:
> #Threads min median mean 75% qps
> 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
> 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
> 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
> 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
> 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
> 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
> 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
> 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5
> I've got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Volodymyr Zhabiuk <vz...@gmail.com>.
Hi Andy
I don't want to publish results, since still there are some mistakes
in the benchmark. Also this would be controversial, because there are
too many parameters to tune and to take into consideration.
Nevertheless you can go to the Sensei google group to see the
preliminary result for Sensei
At first I was using the benchmark to do the stress testing for
Sensei. We needed to identify possible memory leaks and bottlenecks
in the new release. After that I've extended the tool to test Solr
and Elastic search
With many thanks,
Volodymyr
2012/4/27 Andy <an...@yahoo.com>:
> What is the performance of Elasticsearch and SenseiDB in your benchmark?
>
>
> ________________________________
> From: Volodymyr Zhabiuk <vz...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 26, 2012 9:50 PM
> Subject: Benchmark Solr vs Elastic Search vs Sensei
>
> Hi Solr users
>
> I've implemented the project to compare the performance between
> Solr, Elastic Search and SenseiDB
> https://github.com/vzhabiuk/search-perf
> the Solr version 3.5.0 was used. I've used the default configuration,
> just enabled json updates and used the following schema
> https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
> 2.5 mln documents were put into the index, after
> that I've launched the indexing process to add anotherr 500k docs. I
> was issuing commits after each 500 doc batch . At the
> same time I've launched the concurrent client, that sent the
> following type of queries
> ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
> OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
> OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
> OR%20city:u.s.a.*
> &facet=true&facet.field=tags&facet.field=color
> The query contains the high level "OR" query, consisting of 2 terms, 2
> ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
> Here is the performance result:
> #Threads min median mean 75% qps
> 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
> 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
> 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
> 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0
> If there is no indexing process on background
> The result is as follows for 2,6 mln docs:
> #Threads min median mean 75% qps
> 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
> 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
> 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
> 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
> 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
> 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
> 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
> 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5
> I've got three questions so far:
> 1. In case of background indexing the latency is almost 2 times
> higher, is there any way to overcome this?
> 2. How can we tune the Solr to get better results ?
> 3. What's in your opinion is the preferred type of queries that I can
> use for the benchmark?
>
> With many thanks,
> Volodymyr
>
>
> BTW here is the spec of my machine
> RedHat 6.1 64bit
> Intel XEON e5620 @2.40 GHz, 8 cores
> 63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Posted by Andy <an...@yahoo.com>.
What is the performance of Elasticsearch and SenseiDB in your benchmark?
________________________________
From: Volodymyr Zhabiuk <vz...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Thursday, April 26, 2012 9:50 PM
Subject: Benchmark Solr vs Elastic Search vs Sensei
Hi Solr users
I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
&facet=true&facet.field=tags&facet.field=color
The query contains the high level "OR" query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads min median mean 75% qps
1 208.95ms 332.66ms 350.48ms 422.92ms 2.8
2 188.68ms 338.09ms 339.22ms 402.15ms 5.9
3 151.06ms 326.64ms 336.20ms 418.61ms 8.8
4 125.13ms 332.90ms 332.18ms 396.14ms 12.0
If there is no indexing process on background
The result is as follows for 2,6 mln docs:
#Threads min median mean 75% qps
1 106.70ms 199.66ms 199.40ms 234.89ms 5.1
2 128.61ms 199.12ms 201.81ms 229.89ms 9.9
3 110.99ms 197.43ms 203.13ms 232.25ms 14.7
4 90.24ms 201.46ms 200.46ms 227.75ms 19.9
5 106.14ms 208.75ms 207.69ms 242.88ms 24.0
6 103.75ms 208.91ms 211.23ms 238.60ms 28.3
7 113.54ms 207.07ms 209.69ms 239.99ms 33.3
8 117.32ms 216.38ms 224.74ms 258.74ms 35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?
With many thanks,
Volodymyr
BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM