You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Mag Gam <ma...@gmail.com> on 2006/10/05 06:19:23 UTC

Advantage of putting lucene index in RDBMS

I have been reading the lists for couple of week now, and I noticed people
asking about placing their indexes into a RDBMS. What is the advantage of
that?

So far lucene was able to solve all my problems, but I am curious how else
people are using it (especially with RDBMS).

TIA

RE: Advantage of putting lucene index in RDBMS

Posted by sachin <sa...@noemacorp.com>.

I feel implementing the Lucene inside the RDBMS is nothing but
implementation of following interfaces :

TermDocs
TermVector
TermPositions


-----Original Message-----
From: Karel Tejnora [mailto:karel@tejnora.cz] 
Sent: Friday, October 06, 2006 4:11 PM
To: java-user@lucene.apache.org
Subject: Re: Advantage of putting lucene index in RDBMS

One think, generally use RDBM for the STORED fields is good idea because 
every segment merging / optimize copies those data once or twice (cfs).

I'm thinking about to put STORED fields in extra file and put pointers 
in cfs. Delete will just mark document as delete. And new operation 
omptimize_full
deletes and corrects pointers.


Karel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Advantage of putting lucene index in RDBMS

Posted by Karel Tejnora <ka...@tejnora.cz>.

One think, generally use RDBM for the STORED fields is good idea because 
every segment merging / optimize copies those data once or twice (cfs).

I'm thinking about to put STORED fields in extra file and put pointers 
in cfs. Delete will just mark document as delete. And new operation 
omptimize_full
deletes and corrects pointers.


Karel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Advantage of putting lucene index in RDBMS

Posted by Aleksei Valikov <va...@gmx.net>.

Hi.

> 
>>>I have been reading the lists for couple of week now, and I noticed 
>>>people asking about placing their indexes into a RDBMS. What is the 
>>>advantage of that?
>>>
>>>So far lucene was able to solve all my problems, but I am curious how 
>>>else people are using it (especially with RDBMS).
> 
> 
>> Having an index in the DB makes it possible to join full-text queries
>> against this index with some other structured queries against other table.
>> 
>> Here's a practical example. We have a data management system for managing
>> geographic metadata. Documents that we manage (geometadata) have spatial
>> extents (bounding boxes), temporal extents (time periods) and a lot of
>> textual information.
> 
>> Currently, complex queries like "contains 'water*', from 1998 till 2001, in
>> area (5, 45, 15, 55)" can't be processed in the relational DB only or in
>> Lucene index only. We have to make a relatively expensive union/join outside
>> the Lucene and the RDB. Having Lucene index within the DB and Lucene query
>> (re)formulatable in SQL would allow us performing the join inside the DB
>> which is much more performant.

 > Aleksei, can you point me to a document detailing this procedure with
 > examples?  If not, would you consider creating one?  I am particularly
 > interested in what prerequisite steps are needed to perform a Lucene query
 > within SQL (if I understand correctly what you are doing).

I may be expressed myself wrong.

This idea with rewriting Lucene indexes in SQL is only an idea. There is no way 
to do this currently. I just answer the theoretical question of "what the 
advantage would be".

Bye.
/lexi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Advantage of putting lucene index in RDBMS

Posted by Mag Gam <ma...@gmail.com>.

I appreciate everyone's responses.

I guess the main advantage of putting lucene's index into a RDBMS is for
flexibility of queries. Personally, I rather use a RDBMS for results than
lucene because I am more experienced with SQL queries than using Java.

Does anyone have a simple example of using FileDocument (
http://lucene.apache.org/java/docs/api/org/apache/lucene/demo/FileDocument.html),
which includes the following fields:path, modified, and contents? I would
like to try this approach ....

TIA!



On 10/5/06, Aleksei Valikov <va...@gmx.net> wrote:
>
> Hi.
>
> > As one of the people who asked about placing indeces into RDBMS, I was
> > primarily interested in just storing index in the RDBMS (basically,
> > storing the structures described on this page
> > http://lucene.apache.org/java/docs/fileformats.html in the relational
> > DB). The main reason is NOT to be able to perform some magic with
> > joining Lucene and 'pure DB query' results (which, actually, IS useful
> > in some curcumstances, but don't really see a problem of doing it in
> > Java after quering DB and Lucene), but rather avoid the cost of
> > reindexing and associated problems in complex enterprise environments.
>
> There no problem joining/intersecting Lucene/DB results in the Java layer
> apart
> from performance. Imagine you have 10k results from Lucene and 10k results
> from
> the RDB and you only need results 20...40 ordered by 'name' field,
> ascending
> (which is the usual case). An sql query with join and limit/offset would
> be much
> faster than joining 20k entries in Java.
>
> > Yet another advantage of storing index in the DB is its 'managability'
> > and 'debugabiliy' (nice word!). Through there is Luke, etc,
> > administrators in big companies do not want to learn many new tools and
> > having smth already familiar to deal with can sometimes be a good
> > argument in favor of product adoption. (BTW, Compass, as Aleksei
> > mentioned, can be the answer to this prayer - meant to check it out long
> > time ago, but haven't got around to it yet. Also, it seems like the
> > project is half-dead. I wonder if it's true...)
>
> Compass is a lively and active project, we successfully use it in
> production.
>
> Bye.
> /lexi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Advantage of putting lucene index in RDBMS

Posted by Aleksei Valikov <va...@gmx.net>.

Hi.

> As one of the people who asked about placing indeces into RDBMS, I was
> primarily interested in just storing index in the RDBMS (basically,
> storing the structures described on this page
> http://lucene.apache.org/java/docs/fileformats.html in the relational
> DB). The main reason is NOT to be able to perform some magic with
> joining Lucene and 'pure DB query' results (which, actually, IS useful
> in some curcumstances, but don't really see a problem of doing it in
> Java after quering DB and Lucene), but rather avoid the cost of
> reindexing and associated problems in complex enterprise environments.

There no problem joining/intersecting Lucene/DB results in the Java layer apart 
from performance. Imagine you have 10k results from Lucene and 10k results from 
the RDB and you only need results 20...40 ordered by 'name' field, ascending 
(which is the usual case). An sql query with join and limit/offset would be much 
faster than joining 20k entries in Java.

> Yet another advantage of storing index in the DB is its 'managability'
> and 'debugabiliy' (nice word!). Through there is Luke, etc,
> administrators in big companies do not want to learn many new tools and
> having smth already familiar to deal with can sometimes be a good
> argument in favor of product adoption. (BTW, Compass, as Aleksei
> mentioned, can be the answer to this prayer - meant to check it out long
> time ago, but haven't got around to it yet. Also, it seems like the
> project is half-dead. I wonder if it's true...)

Compass is a lively and active project, we successfully use it in production.

Bye.
/lexi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Advantage of putting lucene index in RDBMS

Posted by Paul Snyder <ps...@postbulletin.com>.

Re-reading Aleksei's post, I have to ask, is it really not
possible/practical to index the database metadata (such as date, area and
schema/table/primary-key info) as Lucene document fields?  I am having
difficulty conceiving a scenario when this would not be a practical option.

-----Original Message-----
From: Vladimir Olenin [mailto:VOlenin@cihi.ca] 
Sent: Thursday, October 05, 2006 8:41 AM
To: java-user@lucene.apache.org
Subject: RE: Advantage of putting lucene index in RDBMS

As one of the people who asked about placing indeces into RDBMS, I was
primarily interested in just storing index in the RDBMS (basically, storing
the structures described on this page
http://lucene.apache.org/java/docs/fileformats.html in the relational DB).
The main reason is NOT to be able to perform some magic with joining Lucene
and 'pure DB query' results (which, actually, IS useful in some
curcumstances, but don't really see a problem of doing it in Java after
quering DB and Lucene), but rather avoid the cost of reindexing and
associated problems in complex enterprise environments.

There is a wide range of applications like YellowPages, eshops, etc, which
already store indexed data from which their site is built. Each DB update
can be a very complex process (pulling data from different sources, building
views, restructuring DB for faster read access, etc).
If all that is required is to provide 'smarter' and more efficent search
within already existing DB, it only makes sense to try to 'plug' Lucene
engine to _already available_ structures.

Yet another advantage of storing index in the DB is its 'managability'
and 'debugabiliy' (nice word!). Through there is Luke, etc, administrators
in big companies do not want to learn many new tools and having smth already
familiar to deal with can sometimes be a good argument in favor of product
adoption. (BTW, Compass, as Aleksei mentioned, can be the answer to this
prayer - meant to check it out long time ago, but haven't got around to it
yet. Also, it seems like the project is half-dead. I wonder if it's true...)

Again, not really sure whether that's feasible, because Lucene does store
quite a bit of other info together with index (and inverted index), but
sometimes that's an option.

Vlad

PS: not really sure what Aleksei had in mind. From my point of view the only
way to 'perform both DB and Lucene query' on the DB side is to either
'reimplement' Lucene engine in the DB (eg, rewrite it in PL/SQL,
etc) OR perform Java calls from DB (eg, through Java Stored Procedures in
case of Oracle).

-----Original Message-----
From: Paul Snyder [mailto:psnyder@postbulletin.com]
Sent: Thursday, October 05, 2006 9:19 AM
To: java-user@lucene.apache.org
Subject: RE: Advantage of putting lucene index in RDBMS

Aleksei, can you point me to a document detailing this procedure with
examples?  If not, would you consider creating one?  I am particularly
interested in what prerequisite steps are needed to perform a Lucene query
within SQL (if I understand correctly what you are doing).

-----Original Message-----
From: Aleksei Valikov [mailto:valikov@gmx.net]
Sent: Thursday, October 05, 2006 3:39 AM
To: java-user@lucene.apache.org
Subject: Re: Advantage of putting lucene index in RDBMS

Hi.

> I have been reading the lists for couple of week now, and I noticed 
> people asking about placing their indexes into a RDBMS. What is the 
> advantage of that?
> 
> So far lucene was able to solve all my problems, but I am curious how 
> else people are using it (especially with RDBMS).

Having an index in the DB makes it possible to join full-text queries
against this index with some other structured queries against other table.

Here's a practical example. We have a data management system for managing
geographic metadata. Documents that we manage (geometadata) have spatial
extents (bounding boxes), temporal extents (time periods) and a lot of
textual information.

Currently, complex queries like "contains 'water*', from 1998 till 2001, in
area (5, 45, 15, 55)" can't be processed in the relational DB only or in
Lucene index only. We have to make a relatively expensive union/join outside
the Lucene and the RDB. Having Lucene index within the DB and Lucene query
(re)formulatable in SQL would allow us performing the join inside the DB
which is much more performant.

Bye.
/lexi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Advantage of putting lucene index in RDBMS

Posted by Vladimir Olenin <VO...@cihi.ca>.

As one of the people who asked about placing indeces into RDBMS, I was
primarily interested in just storing index in the RDBMS (basically,
storing the structures described on this page
http://lucene.apache.org/java/docs/fileformats.html in the relational
DB). The main reason is NOT to be able to perform some magic with
joining Lucene and 'pure DB query' results (which, actually, IS useful
in some curcumstances, but don't really see a problem of doing it in
Java after quering DB and Lucene), but rather avoid the cost of
reindexing and associated problems in complex enterprise environments.

There is a wide range of applications like YellowPages, eshops, etc,
which already store indexed data from which their site is built. Each DB
update can be a very complex process (pulling data from different
sources, building views, restructuring DB for faster read access, etc).
If all that is required is to provide 'smarter' and more efficent search
within already existing DB, it only makes sense to try to 'plug' Lucene
engine to _already available_ structures.

Yet another advantage of storing index in the DB is its 'managability'
and 'debugabiliy' (nice word!). Through there is Luke, etc,
administrators in big companies do not want to learn many new tools and
having smth already familiar to deal with can sometimes be a good
argument in favor of product adoption. (BTW, Compass, as Aleksei
mentioned, can be the answer to this prayer - meant to check it out long
time ago, but haven't got around to it yet. Also, it seems like the
project is half-dead. I wonder if it's true...)

Again, not really sure whether that's feasible, because Lucene does
store quite a bit of other info together with index (and inverted
index), but sometimes that's an option.

Vlad

PS: not really sure what Aleksei had in mind. From my point of view the
only way to 'perform both DB and Lucene query' on the DB side is to
either 'reimplement' Lucene engine in the DB (eg, rewrite it in PL/SQL,
etc) OR perform Java calls from DB (eg, through Java Stored Procedures
in case of Oracle).

-----Original Message-----
From: Paul Snyder [mailto:psnyder@postbulletin.com] 
Sent: Thursday, October 05, 2006 9:19 AM
To: java-user@lucene.apache.org
Subject: RE: Advantage of putting lucene index in RDBMS

Aleksei, can you point me to a document detailing this procedure with
examples?  If not, would you consider creating one?  I am particularly
interested in what prerequisite steps are needed to perform a Lucene
query within SQL (if I understand correctly what you are doing).

-----Original Message-----
From: Aleksei Valikov [mailto:valikov@gmx.net]
Sent: Thursday, October 05, 2006 3:39 AM
To: java-user@lucene.apache.org
Subject: Re: Advantage of putting lucene index in RDBMS

Hi.

> I have been reading the lists for couple of week now, and I noticed 
> people asking about placing their indexes into a RDBMS. What is the 
> advantage of that?
> 
> So far lucene was able to solve all my problems, but I am curious how 
> else people are using it (especially with RDBMS).

Having an index in the DB makes it possible to join full-text queries
against this index with some other structured queries against other
table.

Here's a practical example. We have a data management system for
managing geographic metadata. Documents that we manage (geometadata)
have spatial extents (bounding boxes), temporal extents (time periods)
and a lot of textual information.

Currently, complex queries like "contains 'water*', from 1998 till 2001,
in area (5, 45, 15, 55)" can't be processed in the relational DB only or
in Lucene index only. We have to make a relatively expensive union/join
outside the Lucene and the RDB. Having Lucene index within the DB and
Lucene query (re)formulatable in SQL would allow us performing the join
inside the DB which is much more performant.

Bye.
/lexi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Advantage of putting lucene index in RDBMS

Posted by Paul Snyder <ps...@postbulletin.com>.

Aleksei, can you point me to a document detailing this procedure with
examples?  If not, would you consider creating one?  I am particularly
interested in what prerequisite steps are needed to perform a Lucene query
within SQL (if I understand correctly what you are doing).

-----Original Message-----
From: Aleksei Valikov [mailto:valikov@gmx.net] 
Sent: Thursday, October 05, 2006 3:39 AM
To: java-user@lucene.apache.org
Subject: Re: Advantage of putting lucene index in RDBMS

Hi.

> I have been reading the lists for couple of week now, and I noticed 
> people asking about placing their indexes into a RDBMS. What is the 
> advantage of that?
> 
> So far lucene was able to solve all my problems, but I am curious how 
> else people are using it (especially with RDBMS).

Having an index in the DB makes it possible to join full-text queries
against this index with some other structured queries against other table.

Here's a practical example. We have a data management system for managing
geographic metadata. Documents that we manage (geometadata) have spatial
extents (bounding boxes), temporal extents (time periods) and a lot of
textual information.

Currently, complex queries like "contains 'water*', from 1998 till 2001, in
area (5, 45, 15, 55)" can't be processed in the relational DB only or in
Lucene index only. We have to make a relatively expensive union/join outside
the Lucene and the RDB. Having Lucene index within the DB and Lucene query
(re)formulatable in SQL would allow us performing the join inside the DB
which is much more performant.

Bye.
/lexi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Advantage of putting lucene index in RDBMS

Posted by Aleksei Valikov <va...@gmx.net>.

Hi.

> I have been reading the lists for couple of week now, and I noticed people
> asking about placing their indexes into a RDBMS. What is the advantage of
> that?
> 
> So far lucene was able to solve all my problems, but I am curious how else
> people are using it (especially with RDBMS).

Having an index in the DB makes it possible to join full-text queries against 
this index with some other structured queries against other table.

Here's a practical example. We have a data management system for managing 
geographic metadata. Documents that we manage (geometadata) have spatial extents 
(bounding boxes), temporal extents (time periods) and a lot of textual information.

Currently, complex queries like "contains 'water*', from 1998 till 2001, in area 
(5, 45, 15, 55)" can't be processed in the relational DB only or in Lucene index 
only. We have to make a relatively expensive union/join outside the Lucene and 
the RDB. Having Lucene index within the DB and Lucene query (re)formulatable in 
SQL would allow us performing the join inside the DB which is much more performant.

Bye.
/lexi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org