You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sivaram <yo...@gmail.com> on 2011/03/11 22:21:28 UTC

Using Solr over Lucene effects performance?

Hello All,

I searched for this but couldn't find a convincing answer. 
I'm planning to use Lucene/Solr in a tool for indexing and searching
documents. I'm thinking of if I use Lucene directly instead of Solr, will it
improves the performance of the search?(in terms of time taken for indexing
or returning search results or if Solr slows down my application when
compared to Lucene). I have worked with Solr in small scale before but this
time I have to use for an index with over a million docs to get indexed and
searched. 

Please let me know what you think and what will be the pros and cons of
using Lucene over Solr or viceversa.

Thanks alot, 
Ram.  

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-over-Lucene-effects-performance-tp2666909p2666909.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr over Lucene effects performance?

Posted by "Burton-West, Tom" <tb...@umich.edu>.
+1 on some kind of simple performance framework that would allow comparing Solr vs Lucene.  Any chance the Lucene benchmark programs in contrib could be adopted to read Solr config information?
BTW: You probably want to empty the OS cache in addition to restarting Solr between each run if the index is large enough so disk I/O is a factor.

Tom


-----Original Message-----
From: Glen Newton [mailto:glen.newton@gmail.com] 
Sent: Friday, March 11, 2011 5:28 PM
To: solr-user@lucene.apache.org; yonik@lucidimagination.com
Cc: sivaram
Subject: Re: Using Solr over Lucene effects performance?

I have seen little repeatable empirical evidence for the usual answer
"mostly no".

With respect: everyone in the Solr universe seems to answer this
question in the way Yonik has.
However, with a large number of requests the XML
serialization/deserialization must have some, likely significant,
impact.

Yonik makes the valid point that that I will generalize to: some
combination of #docs, #queries, doc size, network, hardware, disk, etc
it will impact and others it will be less important.

Is there any chance that a simple performance framework could be
created in Solr, which runs queries directly against Solr, as well as
against the underlying Lucene index directly?
1 - Text file with one query per line (isn't there a tool out there
that will generate random queries based on a given index? Sorry, my
google fails me...)
2 - Test application: Configuration file that defines the max#
parallel queries per second. The queries are run multiple times:
1,2,4,8,16,32...max# queries. Solr is restarted between each run.
These tests are run against:
   a) Solr local
   b) Solr across the network
   c) Lucene index directly, local
   d) Lucene index directly, across the network using RMI (RemoteSearchable)
3 - Generates a report showing the results

It should perhaps also allow a second file with fewer queries that is
used to warm the caches and is not included in the reporting.
Oh, the configuration file should also include the network information
for remote indexes.
The configuration file could also include a parameter for the
probability that a query will be paged into a random 1..n pages, where
n is also a settable parameter.

Just thought a more empirical framework would help all of us, as
opposed to anecdotal evidence.

Thanks,
Glen
http://zzzoot.blogspot.com/

PS. If there is a good analysis of the performance cost in large scale
instances (many documents, many queries in parallel) of the XML
marshaling/demarshaling in Solr, please share it. -g

On Fri, Mar 11, 2011 at 4:48 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Mar 11, 2011 at 4:21 PM, sivaram <yo...@gmail.com> wrote:
>> I searched for this but couldn't find a convincing answer.
>> I'm planning to use Lucene/Solr in a tool for indexing and searching
>> documents. I'm thinking of if I use Lucene directly instead of Solr, will it
>> improves the performance of the search?(in terms of time taken for indexing
>> or returning search results or if Solr slows down my application when
>> compared to Lucene). I have worked with Solr in small scale before but this
>> time I have to use for an index with over a million docs to get indexed and
>> searched.
>
> On a small scale (hundreds of docs or so), Solr's overhead (parsing
> parameters, etc) could matter.
> When you scale up to larger indexes, it's in the noise (i.e. the
> actual computation of searching, faceting, highlighting, etc,
> dominate).
>
> -Yonik
> http://lucidimagination.com
>



-- 

-

Re: Using Solr over Lucene effects performance?

Posted by Glen Newton <gl...@gmail.com>.
I have seen little repeatable empirical evidence for the usual answer
"mostly no".

With respect: everyone in the Solr universe seems to answer this
question in the way Yonik has.
However, with a large number of requests the XML
serialization/deserialization must have some, likely significant,
impact.

Yonik makes the valid point that that I will generalize to: some
combination of #docs, #queries, doc size, network, hardware, disk, etc
it will impact and others it will be less important.

Is there any chance that a simple performance framework could be
created in Solr, which runs queries directly against Solr, as well as
against the underlying Lucene index directly?
1 - Text file with one query per line (isn't there a tool out there
that will generate random queries based on a given index? Sorry, my
google fails me...)
2 - Test application: Configuration file that defines the max#
parallel queries per second. The queries are run multiple times:
1,2,4,8,16,32...max# queries. Solr is restarted between each run.
These tests are run against:
   a) Solr local
   b) Solr across the network
   c) Lucene index directly, local
   d) Lucene index directly, across the network using RMI (RemoteSearchable)
3 - Generates a report showing the results

It should perhaps also allow a second file with fewer queries that is
used to warm the caches and is not included in the reporting.
Oh, the configuration file should also include the network information
for remote indexes.
The configuration file could also include a parameter for the
probability that a query will be paged into a random 1..n pages, where
n is also a settable parameter.

Just thought a more empirical framework would help all of us, as
opposed to anecdotal evidence.

Thanks,
Glen
http://zzzoot.blogspot.com/

PS. If there is a good analysis of the performance cost in large scale
instances (many documents, many queries in parallel) of the XML
marshaling/demarshaling in Solr, please share it. -g

On Fri, Mar 11, 2011 at 4:48 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Mar 11, 2011 at 4:21 PM, sivaram <yo...@gmail.com> wrote:
>> I searched for this but couldn't find a convincing answer.
>> I'm planning to use Lucene/Solr in a tool for indexing and searching
>> documents. I'm thinking of if I use Lucene directly instead of Solr, will it
>> improves the performance of the search?(in terms of time taken for indexing
>> or returning search results or if Solr slows down my application when
>> compared to Lucene). I have worked with Solr in small scale before but this
>> time I have to use for an index with over a million docs to get indexed and
>> searched.
>
> On a small scale (hundreds of docs or so), Solr's overhead (parsing
> parameters, etc) could matter.
> When you scale up to larger indexes, it's in the noise (i.e. the
> actual computation of searching, faceting, highlighting, etc,
> dominate).
>
> -Yonik
> http://lucidimagination.com
>



-- 

-

Re: Using Solr over Lucene effects performance?

Posted by sivaram <yo...@gmail.com>.
Thanks alot Glen and Yonik... That's a very convincing explanation... 

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-over-Lucene-effects-performance-tp2666909p2676015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using Solr over Lucene effects performance?

Posted by Glen Newton <gl...@gmail.com>.
On Fri, Mar 11, 2011 at 5:26 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> That's an apples to oranges comparison - lucene is a library and solr
> is a server.

I partially agree  ;-)

Lucene is a library and Solr is an http server wrapper-plus around Lucene.
Solr also adds (all sorts of great) significant functionality on top of Lucene,

> You can't really "run" lucene on it's own... so some of your
> application code will do stuff that solr probably already does.

It depends. For the functionality that is implemented by Lucene, and
Solr is just passing through to Lucene (like many queries), you could
test this sub-set of queries  (most Solr application have some of
these; some have many of these). Yes, this would only be comparing
these queries, and would not be testing the functionality that Solr is
providing (faceting, etc). And certainly if this part of the
application is the part that has performance problems, you would want
to do this.
So it would be comparing oranges-to-oranges. Actually
oranges-to-(oranges+XML+network)   (when remote).

> So performance-wise it's less about solr vs lucene and more about solr
> vs your potential custom application.  On that basis, Solr can be
> either slower or faster.

It depends on how much of the Solr-added functionality the application
uses. If it primarily uses functionality that passes through to Lucene
(is using Solr for http access, language independence, other reasons)
then asking the question of Solr vs Lucene is valid. Or asking this
question for the subset of the application that has this kind of
profile is also valid.

There _will_ be situations where the developer will not be willing to
take the performance hit of Solr.
Of course, they will lose the great added functionality of Solr,
against which they must balance their requirements and/or the cost of
reimplementing the functionality (as Yonik points out).

Thanks,
Glen  :-)

PS. Some (old) Lucene search performance stuff (100s of GB data;
millions of documents, up to 8192 parallel query threads):
 http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.html
 http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html



On Fri, Mar 11, 2011 at 5:26 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Mar 11, 2011 at 5:07 PM, sivaram <yo...@gmail.com> wrote:
>> So you are saying that it all depends on how we setup the Solr? In a
>> performance perspective, does Solr lags behind Lucene because it's a layer
>> above Lucene to our application or will it have a better indexing and
>> searching techniques than Lucene? (when talking about really big indexes)
>
> That's an apples to oranges comparison - lucene is a library and solr
> is a server.
> You can't really "run" lucene on it's own... so some of your
> application code will do stuff that solr probably already does.
> So performance-wise it's less about solr vs lucene and more about solr
> vs your potential custom application.  On that basis, Solr can be
> either slower or faster.
>
> If you have really specialized needs, you can always gain speed by
> dropping down to a lower level.  The decision to be made is if your
> needs are that special, what your requirements are, and if the extra
> development cost is worth it.  If you're doing something really
> different, Solr could be the completely wrong solution.
>
> It's the same with databases... you could *always* get better
> performance by putting in enough effort and doing custom code rather
> than using an off the shelf product like MySQL.  Just like you could
> always get better performance dropping from Java to C, from C to
> assembler, and moving from general purpose processors to ASICs or
> GPUs, etc.
>
> I guess this is a very long winded way of saying "it depends" ;-)
>
> -Yonik
> http://lucidimagination.com
>



-- 

-

Re: Using Solr over Lucene effects performance?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Mar 11, 2011 at 5:07 PM, sivaram <yo...@gmail.com> wrote:
> So you are saying that it all depends on how we setup the Solr? In a
> performance perspective, does Solr lags behind Lucene because it's a layer
> above Lucene to our application or will it have a better indexing and
> searching techniques than Lucene? (when talking about really big indexes)

That's an apples to oranges comparison - lucene is a library and solr
is a server.
You can't really "run" lucene on it's own... so some of your
application code will do stuff that solr probably already does.
So performance-wise it's less about solr vs lucene and more about solr
vs your potential custom application.  On that basis, Solr can be
either slower or faster.

If you have really specialized needs, you can always gain speed by
dropping down to a lower level.  The decision to be made is if your
needs are that special, what your requirements are, and if the extra
development cost is worth it.  If you're doing something really
different, Solr could be the completely wrong solution.

It's the same with databases... you could *always* get better
performance by putting in enough effort and doing custom code rather
than using an off the shelf product like MySQL.  Just like you could
always get better performance dropping from Java to C, from C to
assembler, and moving from general purpose processors to ASICs or
GPUs, etc.

I guess this is a very long winded way of saying "it depends" ;-)

-Yonik
http://lucidimagination.com

Re: Using Solr over Lucene effects performance?

Posted by sivaram <yo...@gmail.com>.
Thanks for the quick reply Yonik,

So you are saying that it all depends on how we setup the Solr? In a
performance perspective, does Solr lags behind Lucene because it's a layer
above Lucene to our application or will it have a better indexing and
searching techniques than Lucene? (when talking about really big indexes)


--
View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-over-Lucene-effects-performance-tp2666909p2667069.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using Solr over Lucene effects performance?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Mar 11, 2011 at 4:21 PM, sivaram <yo...@gmail.com> wrote:
> I searched for this but couldn't find a convincing answer.
> I'm planning to use Lucene/Solr in a tool for indexing and searching
> documents. I'm thinking of if I use Lucene directly instead of Solr, will it
> improves the performance of the search?(in terms of time taken for indexing
> or returning search results or if Solr slows down my application when
> compared to Lucene). I have worked with Solr in small scale before but this
> time I have to use for an index with over a million docs to get indexed and
> searched.

On a small scale (hundreds of docs or so), Solr's overhead (parsing
parameters, etc) could matter.
When you scale up to larger indexes, it's in the noise (i.e. the
actual computation of searching, faceting, highlighting, etc,
dominate).

-Yonik
http://lucidimagination.com