You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Sergio Stateri <st...@gmail.com> on 2013/09/04 14:06:04 UTC

solr performance against oracle

Hi,

I´m trying to change the data access in the company where I work from
Oracle to Solr. Then I make some test, like this:

In Oracle:

private void go() throws Exception {
Class.forName("oracle.jdbc.driver.OracleDriver");
Connection conn =
DriverManager.getConnection("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
PreparedStatement pstmt = conn.prepareStatement("SELECT DS_ROTEIRO FROM
cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689");
Date initialTime = new Date();
ResultSet rs = pstmt.executeQuery();
rs.next();
String desc = rs.getString(1);
System.out.println("total time:" + (new
Date().getTime()-initialTime.getTime()) + " ms");
System.out.println(desc);
rs.close();
pstmt.close();
conn.close();
}



And in Solr:

private void go() throws Exception {
String baseUrl = "http://localhost:8983/solr/";
this.solrServerUrl = "http://localhost:8983/solr/roteiros/";
server = new HttpSolrServer(solrUrl);
 String docId = AddOneRoteiroToCollection.docId;
 HttpSolrServer solr = new HttpSolrServer(baseUrl);
SolrServer solrServer = new HttpSolrServer(solrServerUrl);

solr.setRequestWriter(new BinaryRequestWriter());
SolrQuery query = new SolrQuery();
 query.setQuery("(id:" + docId + ")"); // search by id
query.addField("id");
query.addField("descricaoRoteiro");

extrairEApresentarResultados(query);
 }

private void extrairEApresentarResultados(SolrQuery query) throws
SolrServerException {
Date initialTime = new Date();
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
THE SOLR RESPONSE TIME
 for (SolrDocument solrDocument : docs) {
System.out.println(solrDocument);
}
System.out.println("Total de documentos encontrados: " + docs.size());
System.out.println("Tempo total: " + now + " ms");
}


"descricaoRoteiro" is the same data that I´m getting in both, using the PK
CD_ROTEIRO that´s in Solr with name "id" (it´s the same data).
Solr data is the same machine, and Solr And Oracle have the same number of
records (arround 800 thousands).

Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
returns arround 20 ms (and Oracle server is in another company, I´m using
dedicated link to access it).

How can I tell to my managers that I´d like to use Solr? I saw that filters
in Solr taks arround 6~10 ms, but they´re a query inside another query
that´s returned previosly.


Thanks for any help. I´d like so much to use Solr, but I really don´t know
to explain this to my managers.


-- 
Sergio Stateri Jr.
stateri@gmail.com

Re: solr performance against oracle

Posted by Andrea Gazzarini <an...@gmail.com>.

You said nothing about your enviroments (e.g. operating systems, what 
kind of Oracle installation you have, whar kind of SOLR installation, 
how many data in database, how many documents in index, RAM for SOLR, 
for Oracle, for OS, and in general hardware...and so on)...

Anyway...a migration from Oracle to SOLR? That is, you're going to throw 
out the window Oracle and completely replace it with SOLR? I would 
consider other aspects first before your performace test...unless you 
have one flat table in Oracle, you should explain to your manager that 
there's a lot work that needs to be done for that kind of migration 
(e.g. collect all query requirements, denormalization)

Best,
Gazza


On 09/04/2013 02:06 PM, Sergio Stateri wrote:
> Hi,
>
> I´m trying to change the data access in the company where I work from
> Oracle to Solr. Then I make some test, like this:
>
> In Oracle:
>
> private void go() throws Exception {
> Class.forName("oracle.jdbc.driver.OracleDriver");
> Connection conn =
> DriverManager.getConnection("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
> PreparedStatement pstmt = conn.prepareStatement("SELECT DS_ROTEIRO FROM
> cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689");
> Date initialTime = new Date();
> ResultSet rs = pstmt.executeQuery();
> rs.next();
> String desc = rs.getString(1);
> System.out.println("total time:" + (new
> Date().getTime()-initialTime.getTime()) + " ms");
> System.out.println(desc);
> rs.close();
> pstmt.close();
> conn.close();
> }
>
>
>
> And in Solr:
>
> private void go() throws Exception {
> String baseUrl = "http://localhost:8983/solr/";
> this.solrServerUrl = "http://localhost:8983/solr/roteiros/";
> server = new HttpSolrServer(solrUrl);
>   String docId = AddOneRoteiroToCollection.docId;
>   HttpSolrServer solr = new HttpSolrServer(baseUrl);
> SolrServer solrServer = new HttpSolrServer(solrServerUrl);
>
> solr.setRequestWriter(new BinaryRequestWriter());
> SolrQuery query = new SolrQuery();
>   query.setQuery("(id:" + docId + ")"); // search by id
> query.addField("id");
> query.addField("descricaoRoteiro");
>
> extrairEApresentarResultados(query);
>   }
>
> private void extrairEApresentarResultados(SolrQuery query) throws
> SolrServerException {
> Date initialTime = new Date();
> QueryResponse rsp = server.query( query );
> SolrDocumentList docs = rsp.getResults();
> long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING
> THE SOLR RESPONSE TIME
>   for (SolrDocument solrDocument : docs) {
> System.out.println(solrDocument);
> }
> System.out.println("Total de documentos encontrados: " + docs.size());
> System.out.println("Tempo total: " + now + " ms");
> }
>
>
> "descricaoRoteiro" is the same data that I´m getting in both, using the PK
> CD_ROTEIRO that´s in Solr with name "id" (it´s the same data).
> Solr data is the same machine, and Solr And Oracle have the same number of
> records (arround 800 thousands).
>
> Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
> returns arround 20 ms (and Oracle server is in another company, I´m using
> dedicated link to access it).
>
> How can I tell to my managers that I´d like to use Solr? I saw that filters
> in Solr taks arround 6~10 ms, but they´re a query inside another query
> that´s returned previosly.
>
>
> Thanks for any help. I´d like so much to use Solr, but I really don´t know
> to explain this to my managers.
>
>

Re: solr performance against oracle

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote:
> I´m trying to change the data access in the company where I work from
> Oracle to Solr.

They work on different principles and fulfill different needs. Comparing
them by a performance oriented test are not likely to be usable point
for selecting between them. Start by describing your typical use cases
instead.

> Solr aways returns the data arround 150~200 ms (from localhost), but Oracle
> returns arround 20 ms (and Oracle server is in another company, I´m using
> dedicated link to access it).

200ms is suspiciously slow for a trivial lookup in 800,000 values. I am
sure we can bring that down to Oracle-time or better, but I do not think
it shows much.

> How can I tell to my managers that I´d like to use Solr?

Why would you like to use Solr?

Re: solr performance against oracle

Posted by Furkan KAMACI <fu...@gmail.com>.

Martin Fowler and Sadagale has a nice book about such kind of architectural
designs: NoSQL Distilled Emerging Polyglot Persistence.If you read it you
will see why to use a NoSQL or an RDBMS or both of them. On the other hand
I have over 50+ millions of documents at a replicated nodes of SolrCloud
and my average response time is ~10 ms So it depends on your architecture,
configuration and hardware specifications.

12 Eylül 2013 Perşembe tarihinde Chris Hostetter <ho...@fucit.org>
adlı kullanıcı şöyle yazdı:
>
> Setting asside the excellent responses that have already been made in this
> thread, there are fundemental discrepencies in what you are comparing in
> your respective timing tests.
>
> first off: a micro benchmark like this is virtually useless -- unless you
> really plan on only ever executing a single query in a single run of a
> java application that then terminates, trying to time a single query is
> silly -- you should do lots and lots of iterations using a large set of
> sample inputs.
>
> Second: what you are timing is vastly different between the two cases.
>
> In your Solr timing, no communication happens over the wire to the solr
> server until the call to server.query() inside your time stamps -- if you
> were doing multiple requests using the same SolrServer object, the HTTP
> connection would get re-used, but as things stand your timing includes all
> of hte network overhead of connecting to the server, sending hte request,
> and reading the response.
>
> in your oracle method however, the timestamps you record are only arround
> the call to executeQuery(), rs.next(), and rs.getString() ... you are
> ignoring the timing neccessary for the getConnection() and
> prepareStatement() methods, which may be significant as they both involved
> over the wire communication with the remote server (And it's not like
> these are one time execute and forget about them methods ... in a real
> long lived application you'd need to manage your connections, re-open if
> they get closed, recreate the prepared statement if your connection has to
> be re-open, etc... )
>
> Your comparison is definitly apples and oranges.
>
>
> Lastly, as others have mentioned: 150-200ms to request a single document
> by uniqueKey from an index containing 800K docs seems ridiculously slow,
> and suggests that something is poorly configured about your solr instance
> (another apples to oranges comparison: you've got an ad-hoc solr
> installation setup on your laptop and you're benchmarking it against a
> remote oracle server running on dedicated remote hardware that has
> probably been heavily tunned/optimized for queries).
>
> You haven't provided us any details however about how your index is setup,
> or how you have confiugred solr, or what JVM options you are using to run
> solr, or what physical resources are available to your solr process (disk,
> jvm heap ram, os file system cache ram) so there isn't much we can offer
> in the way of advice on how to speed things up.
>
>
> FWIW:  On my laptop, using Solr 4.4 w/ the example configs and built in
> jetty (ie: "java -jart start.jar") i got a 3.4 GB max heap, and a 1.5 GB
> default heap, with plenty of physical ram left over for the os file system
> cache of an index i created containing 1,000,000 documents with 6 small
> fields containing small amounts of random terms.  I then used curl to
> execute ~4150 requests for documents by id (using simple search, not the
> /get RTG handler) and return the results using JSON.
>
> This commpleted in under 4.5 seconds, or ~1.0ms/request.
>
> Using the more verbose XML response format (after restarting solr to
> ensure nothing in the query result caches) only took 0.3 seconds longer on
> the total time (~1.1ms/request)
>
> $ time curl -sS '
http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=json&indent=true'
> /dev/null
>
> real    0m4.471s
> user    0m0.412s
> sys     0m0.116s
> $ time curl -sS '
http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=xml&indent=true'
> /dev/null
>
> real    0m4.868s
> user    0m0.376s
> sys     0m0.136s
> $ java -version
> java version "1.7.0_25"
> OpenJDK Runtime Environment (IcedTea 2.3.10)
(7u25-2.3.10-1ubuntu0.12.04.2)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> $ uname -a
> Linux frisbee 3.2.0-52-generic #78-Ubuntu SMP Fri Jul 26 16:21:44 UTC
2013 x86_64 x86_64 x86_64 GNU/Linux
>
>
>
>
>
>
> -Hoss
>

Re: solr performance against oracle

Posted by Chris Hostetter <ho...@fucit.org>.

Setting asside the excellent responses that have already been made in this 
thread, there are fundemental discrepencies in what you are comparing in 
your respective timing tests.

first off: a micro benchmark like this is virtually useless -- unless you 
really plan on only ever executing a single query in a single run of a 
java application that then terminates, trying to time a single query is 
silly -- you should do lots and lots of iterations using a large set of 
sample inputs.

Second: what you are timing is vastly different between the two cases.

In your Solr timing, no communication happens over the wire to the solr 
server until the call to server.query() inside your time stamps -- if you 
were doing multiple requests using the same SolrServer object, the HTTP 
connection would get re-used, but as things stand your timing includes all 
of hte network overhead of connecting to the server, sending hte request, 
and reading the response.

in your oracle method however, the timestamps you record are only arround 
the call to executeQuery(), rs.next(), and rs.getString() ... you are 
ignoring the timing neccessary for the getConnection() and 
prepareStatement() methods, which may be significant as they both involved 
over the wire communication with the remote server (And it's not like 
these are one time execute and forget about them methods ... in a real 
long lived application you'd need to manage your connections, re-open if 
they get closed, recreate the prepared statement if your connection has to 
be re-open, etc... )

Your comparison is definitly apples and oranges.


Lastly, as others have mentioned: 150-200ms to request a single document 
by uniqueKey from an index containing 800K docs seems ridiculously slow, 
and suggests that something is poorly configured about your solr instance 
(another apples to oranges comparison: you've got an ad-hoc solr 
installation setup on your laptop and you're benchmarking it against a 
remote oracle server running on dedicated remote hardware that has 
probably been heavily tunned/optimized for queries).  

You haven't provided us any details however about how your index is setup, 
or how you have confiugred solr, or what JVM options you are using to run 
solr, or what physical resources are available to your solr process (disk, 
jvm heap ram, os file system cache ram) so there isn't much we can offer 
in the way of advice on how to speed things up.


FWIW:  On my laptop, using Solr 4.4 w/ the example configs and built in 
jetty (ie: "java -jart start.jar") i got a 3.4 GB max heap, and a 1.5 GB 
default heap, with plenty of physical ram left over for the os file system 
cache of an index i created containing 1,000,000 documents with 6 small 
fields containing small amounts of random terms.  I then used curl to 
execute ~4150 requests for documents by id (using simple search, not the 
/get RTG handler) and return the results using JSON.

This commpleted in under 4.5 seconds, or ~1.0ms/request.

Using the more verbose XML response format (after restarting solr to 
ensure nothing in the query result caches) only took 0.3 seconds longer on 
the total time (~1.1ms/request)

$ time curl -sS 'http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=json&indent=true' > /dev/null

real	0m4.471s
user	0m0.412s
sys	0m0.116s
$ time curl -sS 'http://localhost:8983/solr/collection1/select?q=id%3A[1-1000000:241]&wt=xml&indent=true' > /dev/null

real	0m4.868s
user	0m0.376s
sys	0m0.136s
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ uname -a
Linux frisbee 3.2.0-52-generic #78-Ubuntu SMP Fri Jul 26 16:21:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux






-Hoss