You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Vladimir Olenin <VO...@cihi.ca> on 2006/10/03 17:58:32 UTC

native Java DB (eg, Derby) to store the index: performance comparision?..

Hi,
 
I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):
 
- easily managable and maintainable index (accessible through any SQL
client tool)
- efficient access into large massives of data
  * potential support of 'distributed' DB, which can spawn across
multiple boxes transparently to the client app (the Lucene engine
generating the queries)
- much less hassle of integrating Lucene into the applications backed by
the DB (eg, many stores, 'city sites', portals which already have all
their data in relational tables and only need to get efficient fuzzy
searches across this data)
  * no need to keep Lucene index in sync with data, since Lucene will
reuse PKs and indexes from the DB
 
So, I think the main question is whether Lucene custom way of
maintaining _and accessing_ the index is (much?) more efficient than
that one of available open source native Java DBs (Derby, etc)
 
Thanks!
 
Vladimir Olenin
Software Architect
[w]: 416-544-5598
[c]: 416-854-8384
[f]: 416-481-2950

Re: native Java DB (eg, Derby) to store the index: performance comparision?..

Posted by Aleksei Valikov <va...@gmx.net>.

Hi.


> I've been wondering if anyone has tried to compare the performance of
> any 'native' Java DB as index storage mechanism vs Lucene custom
> implementation? I'm assuming that DB products should provide some
> functionality for 'free' right out of the box (correct, if I'm wrong):
>  
> - easily managable and maintainable index (accessible through any SQL
> client tool)
> - efficient access into large massives of data
>   * potential support of 'distributed' DB, which can spawn across
> multiple boxes transparently to the client app (the Lucene engine
> generating the queries)
> - much less hassle of integrating Lucene into the applications backed by
> the DB (eg, many stores, 'city sites', portals which already have all
> their data in relational tables and only need to get efficient fuzzy
> searches across this data)
>   * no need to keep Lucene index in sync with data, since Lucene will
> reuse PKs and indexes from the DB
>  
> So, I think the main question is whether Lucene custom way of
> maintaining _and accessing_ the index is (much?) more efficient than
> that one of available open source native Java DBs (Derby, etc)
>  
> Thanks!

You may be interested in Compass Framework. It is build on top of lucene, 
implements JDBC-based storage as well as synchronization with things like Hibernate.

In my apps, I have to use both Lucene and relational databases since they both 
have unique querying characteristics. I mean, there are requests which are 
implementable on a RDB but not in Lucene, requests which are implementable with 
Lucene but not in RDB. There are also queries which run on both.


Your idea of using RDBs to store Lucene indexes looks quite nice in the first 
approach. You probably imagine something like

select id from tbl_index where value like 'te%st' or value like 'f_ne'

for a query like

"te*st f?ne"

Yes, this looks quite nice, in the first approach.

But if you take a closer look, you'll quickly find out that only a part of 
Lucene queries could be converted into such SQLs.

Next problem is index format. Lucene indexes are (a bit ;) ) more complex than 
simple index tables. So there's no "easy" index format which would make sense 
for  "any SQL clien tool".

There'll be also problems if you try to reuse PKs and DB indexes. You'll end up 
with a lot of constraint exceptions and stale indexes - and someone still HAS to 
sync the full text index - even if it's int its own table.

Finally, I have no numbers but from the gut feeling I don't think Lucene over 
HSQLDB or Derby will be much more performant that Lucene on its own. Seriously 
doubt that.

And still I like you idea. I work a lot with queries which currently require 
evaluation in both Lucene and RDB. I would be fine with a limited Lucene query 
syntax which would allow queries be processed homogeneously in a RDB only.

Bye.
/lexi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org