You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marcelo Ochoa <ma...@gmail.com> on 2006/11/22 14:09:48 UTC

Oracle and Lucene Integration

Hi all:
  I read on this a list many threads about Lucene indexing framework
integration with Oracle.
http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_string=oracle%20jvm%20BLOB;#41104
  So it push me to work in a Lucene and Oracle JVM (a Java virtual
machine running inside the Oracle database).
  The reason to do this is:
   - Using traditional File System for storing the inverted index is
not a good option for some users.
   - Using BLOB for storing the inverted index running Lucene outside
the Oracle database has a bad performance because there are a lot of
network round trips and data marshalling.
   - Indexing relational data stores such as tables with VARCHAR2,
CLOB or XMLType with Lucene running outside the database has the same
problem as the previous point.
   - The JVM included inside the Oracle database can scale up to
10.000+ concurrent threads without memory leaks or deadlock and all
the operation on tables are in the same memory space!!
   With these points in mind, I uploaded the complete Lucene framework
inside the Oracle JVM and I runned the complete JUnit test case
successful, except for some test such as the RMI test which requires
special grants to open ports inside the database.
   The Lucene's test cases run faster inside the Oracle database (11g)
than the Sun JDK 1.5, because the classes are automatically JITed
after some executions.
   I had implemented and OJVMDirectory Lucene Store which replaces the
file system storage with a BLOB based storage, compared with a
RAMDirectory implementation is a bit slower but we gets all the
benefits of the BLOB storage (backup, concurrence control, and so on).
  The OJVMDirectory is cloned from the source at
http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with
some changes to run faster inside the Oracle JVM.
  At this moment, I am working in a full integration with the SQL
Engine using the Data Cartridge API, it means using Lucene as a new
Oracle Domain Index.
  With this extension we can create a Lucene Inverted index in a table using:

create index it1 on t1(f2) indextype is LuceneIndex parameters('test');

  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or
XMLType, after this, the query against the Lucene inverted index can
be made using a new Oracle operator:

select * from t1 where contains(f2, 'Marcelo') = 1;

  the important point here is that this query is integrated with the
execution plan of the Oracle database, so in this simple example the
Oracle optimizer see that the column "f2" is indexed with the Lucene
Domain index, then using the Data Cartridge API a Java code running
inside the Oracle JVM is executed to open the search, a fetch all the
ROWID that match with "Marcelo" and get the rows using the pointer,
here the output:

SELECT STATEMENT	                              ALL_ROWS	3	1	115
        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1		3	1	115
             DOMAIN INDEX LUCENE.IT1

  Another benefits of using the Data Cartridge API is that if the
table T1 has insert, update or delete rows operations a corresponding
Java method will be called to automatically update the Lucene Index.
  Well may be the email is so long, if anybody is interested in this
implementation I can put in a public web site.
   Best regards, Marcelo.

PD: For Oracle users the big question is, Why do I use Lucene instead
of Oracle Text which is implemented in C?
I think that the answer is too simple, Lucene is open source and
anybody can extend it and add the functionality needed :)
-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Oracle and Lucene Integration

Posted by Marcelo Ochoa <ma...@gmail.com>.
Hi Doug:
  I'll create a Jira account today and extract my code from the
lucene-2.0.0 tree.
  Then I'll upload to Jira to get more feedback.
  May be the first optimization for the TODO list will be replace the
insert/delete/update Data  Cartridge entry points for a new code which
put the modification in a queue to be processed with a new Sync method
regularly using an Oracle's DBMS_SCHEDULER or by the user's
application code.
  Best regards, Marcelo.

On 11/22/06, Doug Cutting <cu...@apache.org> wrote:
> Marcelo Ochoa wrote:
> >  Then I'll move the code outside the lucene-2.0 code tree to be
> > packed as subdirectory of the contrib area, for example.
> >  Other alternative is to make an small zip file and send it to the
> > list as attach as a preliminary (alpha-alpha version ;)
>
> This sounds like great potential addition to Lucene!
>
> I encourage you to create an issue in Jira and attach your code there
> now.  (Anyone can create themselves a Jira account.)
>
> There's no need to wait until it is cleaned up: attach an initial
> version to an issue and describe what you intend to do next, then attach
> improved versions later.  That way other folks can provide input, and,
> heaven forbid, should you for some reason lose interest before it is
> polished, someone else can complete the work.
>
> Thanks!
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Oracle and Lucene Integration

Posted by Doug Cutting <cu...@apache.org>.
Marcelo Ochoa wrote:
>  Then I'll move the code outside the lucene-2.0 code tree to be
> packed as subdirectory of the contrib area, for example.
>  Other alternative is to make an small zip file and send it to the
> list as attach as a preliminary (alpha-alpha version ;)

This sounds like great potential addition to Lucene!

I encourage you to create an issue in Jira and attach your code there 
now.  (Anyone can create themselves a Jira account.)

There's no need to wait until it is cleaned up: attach an initial 
version to an issue and describe what you intend to do next, then attach 
improved versions later.  That way other folks can provide input, and, 
heaven forbid, should you for some reason lose interest before it is 
polished, someone else can complete the work.

Thanks!

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Oracle and Lucene Integration

Posted by Marcelo Ochoa <ma...@gmail.com>.
Hi Vladimir:
  Well, I finishing with the implementation of the ancillary operator
score() and the contains function ready to use outside the SQL where
expression, for example:
  select score(1),colx,coly from t1 where contains(f2,'test',1)=1
  select contains(f2,'test') from t1
  Then I'll move the code outside the lucene-2.0 code tree to be
packed as subdirectory of the contrib area, for example.
  Other alternative is to make an small zip file and send it to the
list as attach as a preliminary (alpha-alpha version ;)
  Best regards, Marcelo.
On 11/22/06, Vladimir Olenin <VO...@cihi.ca> wrote:
>
> Hi, Marcelo,
>
> Yes, putting it in the public space would be great. I personally would
> be very interested to have a look. Can it be posted on the 'lucene'
> website?
>
> Vlad
>
> -----Original Message-----
> From: Marcelo Ochoa [mailto:marcelo.ochoa@gmail.com]
> Sent: Wednesday, November 22, 2006 8:10 AM
> To: java-user@lucene.apache.org
> Subject: Oracle and Lucene Integration
>
> Hi all:
>   I read on this a list many threads about Lucene indexing framework
> integration with Oracle.
> http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_stri
> ng=oracle%20jvm%20BLOB;#41104
>   So it push me to work in a Lucene and Oracle JVM (a Java virtual
> machine running inside the Oracle database).
>   The reason to do this is:
>    - Using traditional File System for storing the inverted index is not
> a good option for some users.
>    - Using BLOB for storing the inverted index running Lucene outside
> the Oracle database has a bad performance because there are a lot of
> network round trips and data marshalling.
>    - Indexing relational data stores such as tables with VARCHAR2, CLOB
> or XMLType with Lucene running outside the database has the same problem
> as the previous point.
>    - The JVM included inside the Oracle database can scale up to 10.000+
> concurrent threads without memory leaks or deadlock and all the
> operation on tables are in the same memory space!!
>    With these points in mind, I uploaded the complete Lucene framework
> inside the Oracle JVM and I runned the complete JUnit test case
> successful, except for some test such as the RMI test which requires
> special grants to open ports inside the database.
>    The Lucene's test cases run faster inside the Oracle database (11g)
> than the Sun JDK 1.5, because the classes are automatically JITed after
> some executions.
>    I had implemented and OJVMDirectory Lucene Store which replaces the
> file system storage with a BLOB based storage, compared with a
> RAMDirectory implementation is a bit slower but we gets all the benefits
> of the BLOB storage (backup, concurrence control, and so on).
>   The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with
> some changes to run faster inside the Oracle JVM.
>   At this moment, I am working in a full integration with the SQL Engine
> using the Data Cartridge API, it means using Lucene as a new Oracle
> Domain Index.
>   With this extension we can create a Lucene Inverted index in a table
> using:
>
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>
>   assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or
> XMLType, after this, the query against the Lucene inverted index can be
> made using a new Oracle operator:
>
> select * from t1 where contains(f2, 'Marcelo') = 1;
>
>   the important point here is that this query is integrated with the
> execution plan of the Oracle database, so in this simple example the
> Oracle optimizer see that the column "f2" is indexed with the Lucene
> Domain index, then using the Data Cartridge API a Java code running
> inside the Oracle JVM is executed to open the search, a fetch all the
> ROWID that match with "Marcelo" and get the rows using the pointer, here
> the output:
>
> SELECT STATEMENT                                      ALL_ROWS  3
> 1       115
>         TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1
> 115
>              DOMAIN INDEX LUCENE.IT1
>
>   Another benefits of using the Data Cartridge API is that if the table
> T1 has insert, update or delete rows operations a corresponding Java
> method will be called to automatically update the Lucene Index.
>   Well may be the email is so long, if anybody is interested in this
> implementation I can put in a public web site.
>    Best regards, Marcelo.
>
> PD: For Oracle users the big question is, Why do I use Lucene instead of
> Oracle Text which is implemented in C?
> I think that the answer is too simple, Lucene is open source and anybody
> can extend it and add the functionality needed :)
> --
> Marcelo F. Ochoa
> http://marcelo.ochoa.googlepages.com/home
> ______________
> Do you Know DBPrism? Look @ DB Prism's Web Site
> http://www.dbprism.com.ar/index.html
> More info?
> Chapter 17 of the book "Programming the Oracle Database using Java & Web
> Services"
> http://www.amazon.com/gp/product/1555583296/
> Chapter 21 of the book "Professional XML Databases" - Wrox Press
> http://www.amazon.com/gp/product/1861003587/
> Chapter 8 of the book "Oracle & Open Source" - O'Reilly
> http://www.oreilly.com/catalog/oracleopen/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Oracle and Lucene Integration

Posted by Vladimir Olenin <VO...@cihi.ca>.
 
Hi, Marcelo,

Yes, putting it in the public space would be great. I personally would
be very interested to have a look. Can it be posted on the 'lucene'
website?

Vlad

-----Original Message-----
From: Marcelo Ochoa [mailto:marcelo.ochoa@gmail.com] 
Sent: Wednesday, November 22, 2006 8:10 AM
To: java-user@lucene.apache.org
Subject: Oracle and Lucene Integration

Hi all:
  I read on this a list many threads about Lucene indexing framework
integration with Oracle.
http://www.gossamer-threads.com/lists/lucene/java-user/41104?search_stri
ng=oracle%20jvm%20BLOB;#41104
  So it push me to work in a Lucene and Oracle JVM (a Java virtual
machine running inside the Oracle database).
  The reason to do this is:
   - Using traditional File System for storing the inverted index is not
a good option for some users.
   - Using BLOB for storing the inverted index running Lucene outside
the Oracle database has a bad performance because there are a lot of
network round trips and data marshalling.
   - Indexing relational data stores such as tables with VARCHAR2, CLOB
or XMLType with Lucene running outside the database has the same problem
as the previous point.
   - The JVM included inside the Oracle database can scale up to 10.000+
concurrent threads without memory leaks or deadlock and all the
operation on tables are in the same memory space!!
   With these points in mind, I uploaded the complete Lucene framework
inside the Oracle JVM and I runned the complete JUnit test case
successful, except for some test such as the RMI test which requires
special grants to open ports inside the database.
   The Lucene's test cases run faster inside the Oracle database (11g)
than the Sun JDK 1.5, because the classes are automatically JITed after
some executions.
   I had implemented and OJVMDirectory Lucene Store which replaces the
file system storage with a BLOB based storage, compared with a
RAMDirectory implementation is a bit slower but we gets all the benefits
of the BLOB storage (backup, concurrence control, and so on).
  The OJVMDirectory is cloned from the source at
http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with
some changes to run faster inside the Oracle JVM.
  At this moment, I am working in a full integration with the SQL Engine
using the Data Cartridge API, it means using Lucene as a new Oracle
Domain Index.
  With this extension we can create a Lucene Inverted index in a table
using:

create index it1 on t1(f2) indextype is LuceneIndex parameters('test');

  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or
XMLType, after this, the query against the Lucene inverted index can be
made using a new Oracle operator:

select * from t1 where contains(f2, 'Marcelo') = 1;

  the important point here is that this query is integrated with the
execution plan of the Oracle database, so in this simple example the
Oracle optimizer see that the column "f2" is indexed with the Lucene
Domain index, then using the Data Cartridge API a Java code running
inside the Oracle JVM is executed to open the search, a fetch all the
ROWID that match with "Marcelo" and get the rows using the pointer, here
the output:

SELECT STATEMENT	                              ALL_ROWS	3
1	115
        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1		3	1
115
             DOMAIN INDEX LUCENE.IT1

  Another benefits of using the Data Cartridge API is that if the table
T1 has insert, update or delete rows operations a corresponding Java
method will be called to automatically update the Lucene Index.
  Well may be the email is so long, if anybody is interested in this
implementation I can put in a public web site.
   Best regards, Marcelo.

PD: For Oracle users the big question is, Why do I use Lucene instead of
Oracle Text which is implemented in C?
I think that the answer is too simple, Lucene is open source and anybody
can extend it and add the functionality needed :)
--
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java & Web
Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org