You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Daniel John Debrunner <dj...@debrunners.com> on 2004/08/13 22:17:42 UTC

Derby Engine Architecture Qucik Overview

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
A quick introduction to Derby's embedded engine architecture.

Dan.

Module View
=========
A running system is comprised of a monitor and a collection of modules.
A module is a set of discrete functionality, such as a lock manager,
JDBC driver, indexing method etc. A module's interface is typically
defined by a set of Java interfaces, e.g. the java.sql interfaces define
a interface for a JDBC driver. All callers of a module do so purely
through its interface to separate api from implementation. A module's
implementation is a set of classes that implement the required behavior
and interfaces. Thus a module implementation can change or be replaced
with a different implementation without affecting the callers' code.

The monitor is code that maps module requests to implementations based
upon the request and the environment. E.g. with JDK 1.3 the internal
request for a JDBC driver the monitor selects Derby's JDBC 2.0
implementation, while in JDK 1.4 the driver is the JDBC 3.0
implementation. This allows Derby to present a single JDBC driver to the
application regardless of JDK and internally the correct driver is loaded.

Modules are either system wide (shared), e.g. error logging, or
per-service with a service corresponding to a database, e.g. a lock
manager would be a module in a service (database)

This architecture allows different modules to be loaded depending on the
environment and in the past also supported different product
configurations out of the same code base.

Layer/Box View
===========

There are four main code areas, JDBC, SQL, Store and Services.

JDBC presents the only api to Derby to applications and consists of
implementations of the java.sql and javax.sql classes for JDBC 2.0 and
3.0. Applications use Derby solely through its implementations of the
top-level JDBC interfaces (Driver, DataSource, ConnectionPoolDataSource
and XADataSource) and the remaining  JDBC interfaces. E.g. applications
can only use a Derby prepared statement through
java.sql.PreparedStatement and not some class specific to Derby with
additional methods.
The JDBC layer sits on top of the SQL layer.

The SQL layer is split into two main logical areas, compilation and
execution.
SQL compilation is a five step process
~  1  parse using a parser generated by Javacc, results in a tree of
query nodes
~  2  bind to resolve all objects (e.g. table names)
~  3  optimize to determine the best access path
~  4  generation of a Java class (directly to byte code)  to represent
the statement plan
~  5  loading of the class and creation of an instance to represent that
connection's state of the query
The generated statement plan is cached and can be shared by multiple
connections. DDL statements (e.g. CREATE TABLE use a common statement
plan to avoid generation of a Java class file)

SQL Execution is calling execute methods on the instance of the
generated class that return a result set object. This result set is a
Derby ResultSet class, not a JDBC one. The JDBC layer presents the Derby
ResultSet as a JDBC one to the application. For a simple table scan the
query would consist of a single result set object representing the table
scan. For a more complex query the top-level result set "hides" a tree
of result sets that correspond to the correct query. E.g. a
project-restrict result set on top of a join result set that is joining
a table scan result set on T1 with a index scan on table T2.
DML (INSERT/UPDATE/DELETE) are handled the same way, with a ResultSet
that performs all of its work in its open method and returns an update
count.
These result set objects interface with the Store layer to fetch rows
from tables, indexes or perform sorts.

The Store layer is split into two main areas, access and raw.
The access layer presents a conglomerate (table or index)/row based
interface to the SQL layer. It handles table scans, index scans, index
lookups, indexing, sorting, locking policies, transactions, isolation
levels.
The access layer sits on top of the raw store which provides the raw
storage of rows in pages in files, transaction logging, transaction
management. JCE encryption is plugged in here at the page level. The raw
store works with a pluggable file system api that allows the data files
to be stored in the Java filesystem, jar files, jar files in the
classpath, or any other mechanism.

Services are utility modules such as lock management, cache management
(single cache module used to cache many different types from pages to
string translations), error logging etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFBHSHlIv0S4qsbfuQRAj93AKCOl+zPbMwXUeJgz0Cjm3y9NFBwfACfcCS7
0eLYmCLoewInAMeBqpIG/9A=
=PwQ7
-----END PGP SIGNATURE-----



---------------------------------------------------------------------
Derby is a project of the Apache Incubator (http://incubator.apache.org)

To unsubscribe, e-mail: derby-dev-unsubscribe@db.apache.org
For additional commands, e-mail: derby-dev-help@db.apache.org


Re: Derby Engine Architecture Qucik Overview

Posted by Daniel John Debrunner <dj...@debrunners.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steen Jansdal wrote:
| Daniel John Debrunner wrote:
|
|> ~  1  parse using a parser generated by Javacc, results in a tree of
|> query nodes
|
|
| How is Javacc compared to ANTLR? How does the output from
| Javacc look like? The output from ANTLR is a nice looking
| readable java file that you can debug/single step through.
| Note: I have absolutely no experience with Javacc.

Cloudscape and now Derby have always used Javacc (from 1996), I have no
experience with ANTLR. I believe the code generated by Javacc could
be better, a simple sed script could clean up some items. It would
be interesting to try ANTLR, not sure how much work it would be.


|
|
|> ~  4  generation of a Java class (directly to byte code)  to represent
|> the statement plan
|> ~  5  loading of the class and creation of an instance to represent that
|> connection's state of the query
|> The generated statement plan is cached and can be shared by multiple
|> connections. DDL statements (e.g. CREATE TABLE use a common statement
|> plan to avoid generation of a Java class file)
|
|
|
| Interesting concept! How is the speed compared to a "normal"
| plan executer? The first couple of times the statement plan
| are executed, it would be interpreted. Only after a number
| of executions the hot spot compiler will decide to compile
| this into native code.

It has always been done this way, so there are no direct comparision
numbers. This concept arose around the original goal
to have a small footprint, thus using the JVM's interepter was thought
to be less code than having an internal one. It does benefit from the
JIT as you say, running performance tests will see a boost after a
number of iterations. In addition, calling into Java user-supplied
methods (functions & procedures) is direct, rather than through reflection.


|
| Is this concept also supported in J2ME?

Yes, we have run in modified J2ME environments, modified to have the
additional classes Derby requires (see the to-do list). None of those
are related to class generation.

Dan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBIik+Iv0S4qsbfuQRAi06AJ9y8KulIWJ+YvHpC5x5lMdJdVbbAACeL98G
f9KKw8UdXcnbd5ejv29HGJc=
=RFnt
-----END PGP SIGNATURE-----


Re: Derby Engine Architecture Qucik Overview

Posted by Steen Jansdal <st...@jansdal.dk>.
Daniel John Debrunner wrote:

> ~  1  parse using a parser generated by Javacc, results in a tree of
> query nodes

How is Javacc compared to ANTLR? How does the output from
Javacc look like? The output from ANTLR is a nice looking
readable java file that you can debug/single step through.
Note: I have absolutely no experience with Javacc.


> ~  4  generation of a Java class (directly to byte code)  to represent
> the statement plan
> ~  5  loading of the class and creation of an instance to represent that
> connection's state of the query
> The generated statement plan is cached and can be shared by multiple
> connections. DDL statements (e.g. CREATE TABLE use a common statement
> plan to avoid generation of a Java class file)


Interesting concept! How is the speed compared to a "normal"
plan executer? The first couple of times the statement plan
are executed, it would be interpreted. Only after a number
of executions the hot spot compiler will decide to compile
this into native code.

Is this concept also supported in J2ME?

regards

Steen Jansdal

---------------------------------------------------------------------
Derby is a project of the Apache Incubator (http://incubator.apache.org)

To unsubscribe, e-mail: derby-dev-unsubscribe@db.apache.org
For additional commands, e-mail: derby-dev-help@db.apache.org