You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org> on 2006/11/23 00:45:02 UTC

[jira] Created: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
------------------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-724
                 URL: http://issues.apache.org/jira/browse/LUCENE-724
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Store
    Affects Versions: 2.0.0
         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
            Reporter: Marcelo F. Ochoa
            Priority: Minor


Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
The reason to do this is:
  - Using traditional File System for storing the inverted index is not a good option for some users.
  - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
  - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
  - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
  With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
  The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
  I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
 The OJVMDirectory is cloned from the source at
http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
 At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
 With this extension we can create a Lucene Inverted index in a table using:

create index it1 on t1(f2) indextype is LuceneIndex parameters('test');

 assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:

select * from t1 where contains(f2, 'Marcelo') = 1;

 the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
here the output:

SELECT STATEMENT                                      ALL_ROWS      3       1       115
       TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
            DOMAIN INDEX LUCENE.IT1

 Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
  There is a simple HTML file with some explanation of the code.
   The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
  Best regards, Marcelo.

- For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
  I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
- For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "J. Delgado" <jd...@lendingclub.com>.
Michael, are you still working on this replacement of the BLOB I/O?

I'm looking into parameterizing the option of lazy syncs of DML
operations (via calls to LuceneDomainIndex.sync potentially queued
using dbms_aq) which is convenient for bulk inserts vs. real-time
syncs for non-bulked operations for transactional data retrieval.

-- Joaquin

2007/7/12, Michael Goddard (JIRA) <ji...@apache.org>:
>
>     [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512169 ]
>
> Michael Goddard commented on LUCENE-724:
> ----------------------------------------
>
> Marcelo,
>
> Are you still working on this?  I have been experimenting with it recently -- thank you for creating it.  Do you think that the I/O might be faster if the Vector was replaced with BLOB I/O via InputStream, OutputStream directly?  That is what I am working with right now, and I did observe my indexing time for a sample data set go from 22 seconds to 13 seconds.  I do currently have the problem that the resulting index is not behaving correctly and am working on that.
>
>
> > Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> > ------------------------------------------------------------------------------------------------------------------------
> >
> >                 Key: LUCENE-724
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-724
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Store
> >    Affects Versions: 2.0.0
> >         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
> >            Reporter: Marcelo F. Ochoa
> >            Priority: Minor
> >         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
> >
> >
> > Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> > The reason to do this is:
> >   - Using traditional File System for storing the inverted index is not a good option for some users.
> >   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
> >   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
> >   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
> >   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
> >   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
> >   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
> >  The OJVMDirectory is cloned from the source at
> > http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
> >  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
> >  With this extension we can create a Lucene Inverted index in a table using:
> > create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
> >  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> > select * from t1 where contains(f2, 'Marcelo') = 1;
> >  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> > here the output:
> > SELECT STATEMENT                                      ALL_ROWS      3       1       115
> >        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
> >             DOMAIN INDEX LUCENE.IT1
> >  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
> >   There is a simple HTML file with some explanation of the code.
> >    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
> >   Best regards, Marcelo.
> > - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
> >   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> > - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530709 ] 

Grant Ingersoll commented on LUCENE-724:
----------------------------------------

Is the intent of this to be committed as a contrib module (I notice you do grant ASF license)?  This seems like really useful stuff, just not sure how it should be incorporated into Lucene such that we can maintain it.  Presumably it needs an Oracle DB to run, right?  I also notice CVS directories, etc.


> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0, 2.2
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-09-27-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm-01-09-07.tar.gz

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

        Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
    Affects Version/s: 2.2

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0, 2.2
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-09-27-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530976 ] 

Marcelo F. Ochoa commented on LUCENE-724:
-----------------------------------------

Hi Grant:
 I would like to share this code with all Lucene users.
 Sure it depends on Oracle libraries to compile, (see
required-libs.txt file at lib directory).
 The code is designed to be extracted at contrib directory of Lucene
2.2.0 layout and only requires a minor change at main Lucene's
build.xml file:
 <target name="jar-test" depends="compile-test">
   <jar
     destfile="${build.dir}/${final.name}-test.jar"
     basedir="${build.dir}/classes/test"
     excludes="**/*.java"
     />
 </target>
 Which packages Lucene's test suites as jar for uploading inside Oracle JVM.
 As part of the contract with LendingClub.com I suggested that the
license and the code still as Apache 2.0 license and sure they agree
on that.
 I uploaded the code into source forge to provide daily changes to
LendingClub team but we can move the code to apache CVS if you want.
 Best regards, Marcelo.
--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0, 2.2
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-09-27-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463377 ] 

Marcelo F. Ochoa commented on LUCENE-724:
-----------------------------------------

Latest code includes:
- The  Data Cartridge API is used without column data to reduce the data stored on the queue of changes and speedup the operation of the synchronize method.
- Query Hits are cached associated to the index search and the string returned by the QueryParser.toString() method.
- If no ancillary operator is used in the select, do not store the score list.
- The "Stemmer" argument is recognized as parameter given the argument for the SnowBall analyzer, for example: create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Stemmer:English');.
- Before installing the ojvm extension is necessary to execute "ant jar-core" on the snowball directory.
- The IndexWriter.setUseCompoundFile(false) is called to use multi file storage (faster than the compound file) because there is no file descriptor limitation inside the OJVM, BLOBs are used instead of File.
- Files are marked for deletion and they are purged when calling to Sync or Optimize methods.
- Blob are created and populated in one call using Oracle SQL RETURNING information.
- A testing script for using OE sample schema, with query comparisons against Oracle Text ctxsys.context index. 

TODO:
- ODCI Stats interface implementation to provide to the optimizer the information about the cost of using the Domain Index. 
- A binding for using FIRST_ROWS(n) optimizer hint.
- A Digester class for loading DBLP database for testing very big indexes.
- Support for column with XDBUriType values.

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm-09-27-07.tar.gz

This new release includes:
* Synchronized with latest Lucene 2.2.0 production
* Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index.
* Support for user data stores, it means you can not only index one column at time (limited by Data Cartridge API on 10g), now you can index multiples columns at base table and columns on related tabled joined together.
* User Data Stores can be customized by the user, it means writing a simple Java Class users can control which column are indexed, padding used or any other functionality previous to document adding step.
* There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database
* columns these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example.
* lcontains() SQL operator support full Lucene's QueryParser syntax to provide access to all columns indexed, see examples below.
* Support for DOMAIN_INDEX_SORT hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it.
* Automatic index synchronization by using AQ's Call Back.
* Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user's schema, so you can alter storage's preference if you want.
* ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate ;)
* Tested against 10gR2 and 11g database.
* LuceneDomainIndex.countHits() function to replace select count(*) from .. where lcontains(..)>0 syntax.
-  support inline pagination at lcontains(col,'rownum:[n TO m] AND ...") function
* see Readme.txt for details of usage and installation.
-------
Thanks to LendingClub.com to support this contribution.

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0, 2.2
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-09-27-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527500 ] 

Marcelo F. Ochoa commented on LUCENE-724:
-----------------------------------------

Joaquin at lucene-java-dev wrote:
I'm very happy to announce the partial rework and extension to LUCENE-724 (Oracle-Lucene Integration), primarily based on new requirements from LendingClub.com, who commissioned the work to Marcelo Ochoa, the contributer of the original patch (great job Marcelo!). As contribution of LendingClub.com to the Lucene community we have posted the code on a public CVS (sourceforge) as explained below.

Here at Lending Club ( www.lendingclub.com) we have very specific needs regarding the indexing of both structured and unstructured data, most of it transactional in nature and siting in our Oracle !0gR2 DB, with a highly complex schema. Our "ranking" of loans in the inventory includes components of exact, textual and hardcore mathematical calculations including time, amount and spatial constraints. This integration of Lucene into Oracle as a Domain Index will now allow us to query this inventory in real-time. Going against the Lucene index, created on "synthetic documents" comprised of fields being populated from diverse tables (user data store), eliminates the need to create very complex joins to link data from different tables at query time. This, along with the support of the full Lucene query language, makes this a great alternative to:

   1. Using Lucene outside the database which requires "crawling" the data and storing the index outside the database, loosing all the benefits of a fully transactional system and a secure environment.
   2. Using Oracle Text, which is very powerful but lacks the extensibility and flexibility that Lucene offers (for example, being able to query directly the index from the Java layer or implementing our our ranking algorithm), though to be completely fair some of it is addressed in the new Oracle DB 11g version. 

If anyone is interested in learning more how we are going to use this within Lending Club, please drop me a line. BTW, please make sure you check us out: "Lending Club ( http://www.lendingclub.com/), the rapidly growing people-to-people (P2P) lending service that launched as a Facebook application in May 2007, today announced the public availability of its services with the launch of LendingClub.com. Lending Club connects lenders and borrowers based upon shared affinities, enabling them to bypass banks to secure better interest rates on loans"... more about the announcement here http://www.sys-con.com/read/428678.htm. We have seen man entrepreneurs applying for loans and being helped by regular people to build their business with the money obtained at very low interest.

OK, without further marketing stuff (sorry for that), here is the original note sent to me by Marcelo that summarizes all the new cool functionalities:

OJVMDirectory, a Lucene Integration running inside the Oracle JVM is going one step further.

This new release includes:

    * Synchronized with latest Lucene 2.2.0 production
    * Replaced in memory storage using Vector based implementation by direct BLOB IO, reducing memory usage for large index.
    * Support for user data stores, it means you can not only index one column at time (limited by Data Cartridge API on 10g), now you can index multiples columns at base table and columns on related tabled joined together.
    * User Data Stores can be customized by the user, it means writing a simple Java Class users can control which column are indexed, padding
    * used or any other functionality previous to document adding step.
    * There is a DefaultUserDataStore which gets all columns of the query and built a Lucene Document with Fields representing each database
    * columns these fields are automatically padded if they have NUMBER or rounded if they have DATE data, for example.
    * lcontains() SQL operator support full Lucene's QueryParser syntax to provide access to all columns indexed, see examples below.
    * Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to get rows order by lscore() operator (ascending,descending) the optimizer hint will assume that Lucene Domain Index will returns rowids in proper order avoided an inline-view to sort it.
    * Automatic index synchronization by using AQ's Call Back.
    * Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named IndexName$Q with his storage table IndexName$QT at user's schema, so you can alter storage's preference if you want.
    * ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate ;)
    * Tested against 10gR2 and 11g database.


Some sample usages:

create table t2 (
 f4 number primary key,
 f5 VARCHAR2(200));
create table t1 (
 f1 number,
 f2 CLOB,
 f3 number,
 CONSTRAINT t1_t2_fk FOREIGN KEY (f3)
     REFERENCES t2(f4) ON DELETE cascade);
create index it1 on t1(f3) indextype is lucene.LuceneIndex
 parameters('Analyzer:org.apache.lucene.analysis
.SimpleAnalyzer;ExtraCols:f2');

alter index it1
parameters('ExtraCols:f2,t2.f5;ExtraTabs:t2;WhereCondition:t1.f3=t2.f4;DecimalFormat:000');

Lucene domain index will store f2 and f3 columns of table t1 plus f5 of table t2.

So you can query then with:

 select lscore(1),f2 from t1 where lcontains(f3, 'f2:test',1) > 0;
or
 select lscore(1),f2 from t1 where lcontains(f3, 'f2:test and f3:[001 to 200]',1) > 0;

 select /*+ DOMAIN_INDEX_SORT */ lscore(1),f2,t2.f5
 from t1,t2
 where lcontains(f3, 'f2:test1 and f3:[001 to 200] and t2.f5:test2',1) > 0
 and t1.f3=t2.f4
 order by lscore(1) asc;

In latest example Oracle's optimizer will assume that Lucene Domain Index will resolve first a set of rowid matching "f2:test1 and f3:[001 to 200] and t2.f5:test2" then will direct access by by index rowid on table t1 and perform the join with t2.

More examples and information can be found at:
http://dbprism.cvs.sourceforge.net/dbprism/ojvm/Readme.txt?revision=1.10&view=markup

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-724?page=all ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm.tar.gz

see patch description

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: http://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-724?page=all ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm-12-20-06.tar.gz

This new release of the OJVMDirectory Lucene Store includes a fully functional Oracle Domain Index with a queue for update/insert massive operations and a lot of performance improvement.
See the db/readmeOJVM.html file for more detail.


> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: http://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512212 ] 

Marcelo F. Ochoa commented on LUCENE-724:
-----------------------------------------

Michel:
  I am not tested replacing vector based storage to direct BLOB IO.
  Now I am too busy in a project, may be I'll have some time in a few week.
  If you are replacing the vector based access by BLOB IO sure I would
like to test it.
  I am having some open issues specially with the integration of the
data cartridge API and the optimizer.
  Do you have access to an open CVS server to share the code?
  If not, we can use DBPrism cvs repository at Source Forge.
  Also in a few week Oracle 11g will be ready for download at OTN
website, so you can get a lot of performance improvement by using
SECURE LOB (faster than NFS storage) and the JDK 1.5 JIT included in
latest Oracle JVM.
  Best regards, Marcelo.



-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/


> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Michael Goddard (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512169 ] 

Michael Goddard commented on LUCENE-724:
----------------------------------------

Marcelo,

Are you still working on this?  I have been experimenting with it recently -- thank you for creating it.  Do you think that the I/O might be faster if the Vector was replaced with BLOB I/O via InputStream, OutputStream directly?  That is what I am working with right now, and I did observe my indexing time for a sample data set go from 22 seconds to 13 seconds.  I do currently have the problem that the resulting index is not behaving correctly and am working on that.


> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene

Posted by "Marcelo F. Ochoa (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-724?page=all ]

Marcelo F. Ochoa updated LUCENE-724:
------------------------------------

    Attachment: ojvm-11-28-06.tar.gz

This new version of the OracleJVM extension for Lucene has these changes:
- A new build.xml file can be used to compile and install remotely the OJVM extension, this distribution is designed to uncompress into the contrib directory of lucene-2.0.0
- The lucene build.xml file requires an addition to pack the test suites as jars with something like this:
  <target name="jar-test" depends="compile-test">
    <jar
      destfile="${build.dir}/${final.name}-test.jar"
      basedir="${build.dir}/classes/test"
      excludes="**/*.java"
      />
  </target>
- The OracleJVM extension uses this entries into the build.properties file:
db.str=orcl
db.usr=lucene 
db.pwd=lucene 
dba.usr=sys 
dba.pwd=change_on_install
to know which database users and passwords are used to install into the target DB.
- If you want to run the OJVMDirectory test remotely to compare it against the database version add this line into the target "test" of the common-build.xml file
      <!-- Oracle JVM Directory implementation -->
      <sysproperty key="db.str" value="${db.str}"/>
      <sysproperty key="db.usr" value="${db.usr}"/>
      <sysproperty key="db.pwd" value="${db.pwd}"/>
these lines will pass the user name, password and SQLNet connect string to the tests as java's System properties.
- The complet API for the Oracle Domain index was completed, but the solution for the operator contains outside the where clause is not good.
- I will implement a singleton solution for the OJVMDirectory object when is used in read only mode, typically when user performs select operations against tables which have columns indexed with Lucene. This implementation will increase a lot the final performance because the index reader will be ready for each select operation. Obviously I will check if another user or thread makes a write operation on the index to reload the read-only singleton.
- The queue for storing the changes on the index is not implemented yet, I'll add it in a short time.
- I am looking for a big set of XML documents, I dowload the DBLP database which is over 300mb of text documents, but I need to upload to the Oracle Database which is not a simple operation :(
Best regards, Marcelo.

> Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-724
>                 URL: http://issues.apache.org/jira/browse/LUCENE-724
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 2.0.0
>         Environment: Oracle 10g R2 with latest patchset, there is a txt file into the lib directory with the required libraries to compile this extension, which for legal issues I can't redistribute. All these libraries are include into the Oracle home directory,
>            Reporter: Marcelo F. Ochoa
>            Priority: Minor
>         Attachments: ojvm-11-28-06.tar.gz, ojvm.tar.gz
>
>
> Here a preliminary implementation of the Oracle JVM Directory data store which replace a file system by BLOB data storage.
> The reason to do this is:
>   - Using traditional File System for storing the inverted index is not a good option for some users.
>   - Using BLOB for storing the inverted index running Lucene outside the Oracle database has a bad performance because there are a lot of network round trips and data marshalling.
>   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType with Lucene running outside the database has the same problem as the previous point.
>   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent threads without memory leaks or deadlock and all the operation on tables are in the same memory space!!
>   With these points in mind, I uploaded the complete Lucene framework inside the Oracle JVM and I runned the complete JUnit test case successful, except for some test such as the RMI test which requires special grants to open ports inside the database.
>   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun JDK 1.5, because the classes are automatically JITed after some executions.
>   I had implemented and OJVMDirectory Lucene Store which replaces the file system storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
>  The OJVMDirectory is cloned from the source at
> http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes to run faster inside the Oracle JVM.
>  At this moment, I am working in a full integration with the SQL Engine using the Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
>  With this extension we can create a Lucene Inverted index in a table using:
> create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
>  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after this, the query against the Lucene inverted index can be made using a new Oracle operator:
> select * from t1 where contains(f2, 'Marcelo') = 1;
>  the important point here is that this query is integrated with the execution plan of the Oracle database, so in this simple example the Oracle optimizer see that the column "f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match with "Marcelo" and get the rows using the pointer,
> here the output:
> SELECT STATEMENT                                      ALL_ROWS      3       1       115
>        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
>             DOMAIN INDEX LUCENE.IT1
>  Another benefits of using the Data Cartridge API is that if the table T1 has insert, update or delete rows operations a corresponding Java method will be called to automatically update the Lucene Index.
>   There is a simple HTML file with some explanation of the code.
>    The install.sql script is not fully tested and must be lunched into the Oracle database, not remotely.
>   Best regards, Marcelo.
> - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text which is implemented in C?
>   I think that the answer is too simple, Lucene is open source and anybody can extend it and add the functionality needed
> - For Lucene users which try to use Lucene as enterprise search engine, the Oracle JVM provides an highly scalable container which can scale up to 10.000+ concurrent session and with the facility of querying table in the same memory space.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org