You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-dev@jakarta.apache.org by "Kumar, Ashok" <ak...@metatomix.com> on 2002/04/12 22:14:31 UTC

Proposal for a more efficient Slide Database Design

Hi,

New Slide Database Store Proposal
---------------------------------

Here is a proposal for a more optimized database schema for the Slide
J2EE Database Store. The major changes to the current schema involve
adding numeric identifiers for URI's and other strings that were
previously used as keys. This allows for much faster lookups and the use
of foreign keys for referential integrity. Other changes include merging
and dropping some of the existing tables. All the existing slide
functionality was maintained.


Justification for the new schema
--------------------------------

Using the existing Slide Database store for storing and retrieving a
large number of documents is extremely inefficient. The primary reason
for this is the lack of indexing. Applying indexing to the existing
database schema definitely helps. However, due to the fact that the URI
field (being a large VARCHAR(800) datatype) is the main key linking most
of the tables, it makes a very inefficient index. 

In addition, most databases have a limit on index size. In tables like
CHILDREN, where the unique index would be a combination of the URI and
CHILD_URI fields, the index size would exceed most database limits.

Finally, duplicating the data in large fields like URI in almost all of
the database tables is an inefficient waist of space and simple
normalization of this field provides a dramatic improvement in this
area.


The New Schema
--------------

Here is an overview of the schema changes:

- Three new tables were created to hold numeric key mappings for URI,
BRANCH and LABEL 
- WORKING_REVISION ->   removed
- LATEST_REVISIONS and REVISION -> VERSION_HISTORY
- REVISION_CONTENT ->   VERSION_CONTENT
- REVISIONS        ->   VERSION
- BRANCHES 	       ->   VERSION_PREDS (predecessors)
- LABEL            ->   VERSION_LABELS
- OBJECT           ->  (same + URI key mapping)
- LOCKS            -> (same)
- LINKS            -> (same)
- CHILDREN         -> (same)
- PERMISSIONS      -> (same)
- PROPERTY 	       -> (same)

Plus the addition of indexing and foreign key references.

The following schema will work with MS SQLServer and Sybase.
(although with Sybase the user defined datatypes need to be changed to
map to Sybase datatype limits)
Other schema's for Oracle, DB2 and MySQL will follow.

<<see attached file: new_slide_schema.zip>>


New Schema Implementation
------------------------

In order not to modify the existing database stores we have created
J2EEStore2. This store can be configured to point to a different
database instance that reflects the new schema.

Source files:
J2EEStore2.java( same as the J2EEStore.java with the addition of
hashtables for URI_ID lookup)
J2EEDescriptorsStore2.java
J2EEContentStore2.java

<<see attached java source files : J2EEStore.zip>>


Tests Conducted
---------------

Configuration:

Latest version of slide (from CVS as of March 25) with J2EEStore2
Tomcat 4.04 
MS SQLServer as the database store.

For the test we created a collection structure /rdf/testcase/test1 and
in this collection stored 1800 files each with a size of 313 KB. 

>From the Browser (IE6) a request to view the contents of this collection
(/rdf/testcase/test1) triggers a GET to this collection, 


    All Test Results are in Milliseconds. (by tracking Database Logs)
    =================================================================
    Operation                  OLD Schema     New Schema
    -----------------------------------------------------------------
    Starting                         1933           1559

    Put(testcase/test1/xxfile)       2544           1578

    Get(testcase)                    2805           2103	

    Get(testcase/test1)            174616          98477


Conclusion
----------

We believe the this is a good start for a new database schema. The test
results already show a large improvement and we believe addition code
modifications will result in a much greater improvement. 

In the process of redesigning the database we uncovered a number of
referential errors that were fixed by the new schema. For example,
problems we previously had with deadlocks when storing / updating very
large files are no longer an issue. 

We are looking for feedback from this proposal.

Thank you...

Ashok


RE: Proposal for a more efficient Slide Database Design

Posted by Jacob Lund <ja...@qualiware.com>.
Hi,

I was testing you proposal and I am having some problems. My test is on
a ms sql 2000 server and a tomcat 4.0.3. My slide installation is 1_0
from the cvs.

I can see that you test system is tomcat 4.0.4 and newest slide from
cvs.

I am using newest version of jsqlconnect for jdbc connection.

The message I get is:
23 Apr 2002 11:04:23 - slidestore.j2ee.J2EEDescriptorsStore2 - ERROR -
com.jnetd
irect.jsql.u: INSERT statement conflicted with COLUMN FOREIGN KEY
constraint 'FK
__PERMISSIO__OBJEC__0D44F85C'. The conflict occurred in database
'slide', table
'URI', column 'URI_ID'.
com.jnetdirect.jsql.u: INSERT statement conflicted with COLUMN FOREIGN
KEY const
raint 'FK__PERMISSIO__OBJEC__0D44F85C'. The conflict occurred in
database 'slide
', table 'URI', column 'URI_ID'.

What kind of constraints do you assume on the relationships (cascading
for update, Insert, delete)?

/Jacob


-----Original Message-----
From: Kumar, Ashok [mailto:akumar@metatomix.com] 
Sent: 12. april 2002 22:15
To: slide-dev@jakarta.apache.org
Cc: slide-user@jakarta.apache.org
Subject: Proposal for a more efficient Slide Database Design


Hi,

New Slide Database Store Proposal
---------------------------------

Here is a proposal for a more optimized database schema for the Slide
J2EE Database Store. The major changes to the current schema involve
adding numeric identifiers for URI's and other strings that were
previously used as keys. This allows for much faster lookups and the use
of foreign keys for referential integrity. Other changes include merging
and dropping some of the existing tables. All the existing slide
functionality was maintained.


Justification for the new schema
--------------------------------

Using the existing Slide Database store for storing and retrieving a
large number of documents is extremely inefficient. The primary reason
for this is the lack of indexing. Applying indexing to the existing
database schema definitely helps. However, due to the fact that the URI
field (being a large VARCHAR(800) datatype) is the main key linking most
of the tables, it makes a very inefficient index. 

In addition, most databases have a limit on index size. In tables like
CHILDREN, where the unique index would be a combination of the URI and
CHILD_URI fields, the index size would exceed most database limits.

Finally, duplicating the data in large fields like URI in almost all of
the database tables is an inefficient waist of space and simple
normalization of this field provides a dramatic improvement in this
area.


The New Schema
--------------

Here is an overview of the schema changes:

- Three new tables were created to hold numeric key mappings for URI,
BRANCH and LABEL 
- WORKING_REVISION ->   removed
- LATEST_REVISIONS and REVISION -> VERSION_HISTORY
- REVISION_CONTENT ->   VERSION_CONTENT
- REVISIONS        ->   VERSION
- BRANCHES 	       ->   VERSION_PREDS (predecessors)
- LABEL            ->   VERSION_LABELS
- OBJECT           ->  (same + URI key mapping)
- LOCKS            -> (same)
- LINKS            -> (same)
- CHILDREN         -> (same)
- PERMISSIONS      -> (same)
- PROPERTY 	       -> (same)

Plus the addition of indexing and foreign key references.

The following schema will work with MS SQLServer and Sybase. (although
with Sybase the user defined datatypes need to be changed to map to
Sybase datatype limits) Other schema's for Oracle, DB2 and MySQL will
follow.

<<see attached file: new_slide_schema.zip>>


New Schema Implementation
------------------------

In order not to modify the existing database stores we have created
J2EEStore2. This store can be configured to point to a different
database instance that reflects the new schema.

Source files:
J2EEStore2.java( same as the J2EEStore.java with the addition of
hashtables for URI_ID lookup) J2EEDescriptorsStore2.java
J2EEContentStore2.java

<<see attached java source files : J2EEStore.zip>>


Tests Conducted
---------------

Configuration:

Latest version of slide (from CVS as of March 25) with J2EEStore2 Tomcat
4.04 
MS SQLServer as the database store.

For the test we created a collection structure /rdf/testcase/test1 and
in this collection stored 1800 files each with a size of 313 KB. 

>From the Browser (IE6) a request to view the contents of this collection
(/rdf/testcase/test1) triggers a GET to this collection, 


    All Test Results are in Milliseconds. (by tracking Database Logs)
    =================================================================
    Operation                  OLD Schema     New Schema
    -----------------------------------------------------------------
    Starting                         1933           1559

    Put(testcase/test1/xxfile)       2544           1578

    Get(testcase)                    2805           2103	

    Get(testcase/test1)            174616          98477


Conclusion
----------

We believe the this is a good start for a new database schema. The test
results already show a large improvement and we believe addition code
modifications will result in a much greater improvement. 

In the process of redesigning the database we uncovered a number of
referential errors that were fixed by the new schema. For example,
problems we previously had with deadlocks when storing / updating very
large files are no longer an issue. 

We are looking for feedback from this proposal.

Thank you...

Ashok



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>