You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Anthony Vito <vi...@mnis.com> on 2004/04/17 00:22:36 UTC

SQLDirectory implementation

  I noticed some talk on SQLDirectory a month or so ago. ( I just joined
the list :) ) I have a JDBC implementation that stores the "files" in a
couple of tables and stores the data for the files as blocks (BLOBs) of
a certain size ( 16k by default ). It also has an LRU cache for the
blocks which makes the performance quite acceptable. 

Quoting Doug:
--------------------------
The way I would try to implement Directory with SQL is to have a single
table of buffers per index, e.g., with columns ID, BLOCK_NUMBER and
DATA.  The contents of a file are the appended DATA columns with the
same ID, ordered by the BLOCK_NUMBER field.  This would be indexed by ID
and BLOCK_NUMBER, together a unique key.

The BLOCK_NUMBER field indicates which part of the file the row
concerns.  Thus the DATA of BLOCK_NUMBER=0 might hold the first 1024
bytes, the DATA of BLOCK_NUMBER=1 might hold the next 1024 bytes, and so
on.  This would permit efficient random access.

You'll need another table with NAME, ID, and MODIFIED_DATE, with a
single entry per file.  The length of a file can be computed with a
query that finds the length of DATA in the last BLOCK_NUMBER with an ID.

I would initially cache a single connection to the database and
serialize requests over it.  A pool of connections might be more
efficient when multiple threads are searching, but I would benchmark
that before investing much in such an implementation.

Has anyone yet implemented an SQL Directory this way?
--------------------------------

So to answer the question... Pretty Much. Just a few little minor
differences.

I have one table that stores each file as a row, with a name, and a
directory name, so I have have more then one index stored in the same
two tables, and a length. The other table stores an ID from the first
table, a sequence number (BLOCK_NUMBER), and the DATA for that BLOCK. My
current code creates new prepared statements for each DB access, so a
statement pooling connection is a must. ( this could probably be worked
around )

I actually prefixed all the file names with MySQL. Even though it's pure
JDBC and should work with any driver or database. I'll go clean that up
this weekend and put up a site with the code and the API docs. I'd be
interested to see what kind people have to say, and if the results of
any better tests people have cooked up.

-vito


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Anthony Vito <vi...@mnis.com>.
> 
> >Has anyone had a chance to play with this?  How did it work?
> 
> Still working on it.
> 
> My database is FrontBase, which is heavily SQL92-compliant. I'm 
> hacking my way through the (my)SQL that SQLDirectory uses, and am 
> trying to bring it into compliance so that I can run the unit tests.
> 
> Will report later.

If I can be of any assistance. Please, let me know. I am working on a
version that factors out the SQL with the ability to provide the field
types for some of the entries... it will still rely on the JDBC driver
supporting the getBytes methods for whatever field is choosen, and the
underlying DB not altering those bytes. All vaporware right now though.

-vito



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Avi Drissman <av...@baseview.com>.
At 12:50 PM -0700 5/11/04, you wrote:

>Has anyone had a chance to play with this?  How did it work?

Still working on it.

My database is FrontBase, which is heavily SQL92-compliant. I'm 
hacking my way through the (my)SQL that SQLDirectory uses, and am 
trying to bring it into compliance so that I can run the unit tests.

Will report later.

Avi
-- 
Avi 'rlwimi' Drissman
avi@baseview.com
Argh! This darn mail server is trunca

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Doug Cutting <cu...@apache.org>.
Has anyone had a chance to play with this?  How did it work?

Anthony Vito wrote:
> Hello Lucene Users,
>   I have release JDBCDirectory version 0.05. I have only ever used it
> with MySQL and open source drivers. I would be interested in any
> comments or suggestions. I will try to be as diligent as possible in my
> responses. The documentation ( very little ), jars and source code can
> be found at http://ppinew.mnis.com/jdbcdirectory . 
> 
> Some issues that I just thought of that aren't mentioned...
> 1.) Pooling prepared statements on the connection is must for good
> performance under the current code. ( see test code )
> 2.) The first search is always really slow as everything initializes and
> the cache fills ;) so don't let that discourage you.
> 
> -vito
> 
> On Mon, 2004-04-26 at 14:59, Doug Cutting wrote:
> 
>>Anthony Vito wrote:
>>
>>>  I noticed some talk on SQLDirectory a month or so ago. .....
> 
> 
>>Did you ever post this code?  It would be a great contribution to Lucene.
>>
>>Thanks,
>>
>>Doug
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
I've added a reference to it on the wiki:
http://wiki.apache.org/jakarta-lucene/LatestNews?action=show

If you want to modify the info there, go for it.

sv

On 7 May 2004, Anthony Vito wrote:

> Hello Lucene Users,
>   I have release JDBCDirectory version 0.05. I have only ever used it
> with MySQL and open source drivers. I would be interested in any
> comments or suggestions. I will try to be as diligent as possible in my
> responses. The documentation ( very little ), jars and source code can
> be found at http://ppinew.mnis.com/jdbcdirectory . 
> 
> Some issues that I just thought of that aren't mentioned...
> 1.) Pooling prepared statements on the connection is must for good
> performance under the current code. ( see test code )
> 2.) The first search is always really slow as everything initializes and
> the cache fills ;) so don't let that discourage you.
> 
> -vito
> 
> On Mon, 2004-04-26 at 14:59, Doug Cutting wrote:
> > Anthony Vito wrote:
> > >   I noticed some talk on SQLDirectory a month or so ago. .....
> 
> > Did you ever post this code?  It would be a great contribution to Lucene.
> > 
> > Thanks,
> > 
> > Doug
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Anthony Vito <vi...@mnis.com>.
Hello Lucene Users,
  I have release JDBCDirectory version 0.05. I have only ever used it
with MySQL and open source drivers. I would be interested in any
comments or suggestions. I will try to be as diligent as possible in my
responses. The documentation ( very little ), jars and source code can
be found at http://ppinew.mnis.com/jdbcdirectory . 

Some issues that I just thought of that aren't mentioned...
1.) Pooling prepared statements on the connection is must for good
performance under the current code. ( see test code )
2.) The first search is always really slow as everything initializes and
the cache fills ;) so don't let that discourage you.

-vito

On Mon, 2004-04-26 at 14:59, Doug Cutting wrote:
> Anthony Vito wrote:
> >   I noticed some talk on SQLDirectory a month or so ago. .....

> Did you ever post this code?  It would be a great contribution to Lucene.
> 
> Thanks,
> 
> Doug
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: SQLDirectory implementation

Posted by Doug Cutting <cu...@apache.org>.
Anthony Vito wrote:
>   I noticed some talk on SQLDirectory a month or so ago. ( I just joined
> the list :) ) I have a JDBC implementation that stores the "files" in a
> couple of tables and stores the data for the files as blocks (BLOBs) of
> a certain size ( 16k by default ). It also has an LRU cache for the
> blocks which makes the performance quite acceptable. 
> 
> I actually prefixed all the file names with MySQL. Even though it's pure
> JDBC and should work with any driver or database. I'll go clean that up
> this weekend and put up a site with the code and the API docs. I'd be
> interested to see what kind people have to say, and if the results of
> any better tests people have cooked up.

Did you ever post this code?  It would be a great contribution to Lucene.

Thanks,

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org