You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Ян Программист <we...@gmail.com> on 2010/04/05 16:47:15 UTC

SQLXML support for Derby

I have submitted my proposal to Google. Here is the latest version. of it.
Leave your comments, you all up their in Derby community.

For people, who doesn't prefer to follow external links, I post proposal
here:

*Abstract*

JavaDB (Apache Derby) is a 100% pure Java RDBMS. It was design to have small
or zero administration required. Especially interesting is an idea of
improving existing system under production, when architecture improvements
needed to provide customer needs, but movements of stored data is painless
due to unbreakable production cycle, in which Derby would be involved.
*Description** *

One of intersting situations is when there Derby a running
under production configuration, and Web services should be provided (as part
of further development):

Existing SQL functionality should be unbreakable. Besides SQL grammar
changes require restarting storage(s), so a good solution is to make JDBC
client XML compatible, instead of adding special XML formatting functions to
SQL. SQLXML, as part of JDBC 4, is a good way here. To achieve this, I
will:
1.1. Propose, design, and implement a SQLXML implementation for Derby
which adheres to the JDBC and ISO standards.
1.2. Verify Derby's implementation by comparing its behavior against
existing SQLXML implementations in Postgres and HyperSQL
1.3. Contribute a regression test suite to the Derby test suites which
includes XMLUnit test cases for Derby's SQLXML implementation that
demonstrate the completeness
of the implementation and also that verify the compatibility behaviors,
found in the previous cross-database testing
2.

Such disadvantage as repeatable tag names in XML responds could be fought
via deduplication
3.

Derby storage would feel executing normal SQL requests. That would be a
JDBC connector responsibility to convert XML to SQL on requests & result
set(s) to XML on responds
4.

A reasonable way to get rid of those tree indexes in RDBMS. But because
translation of columns to XML nodes is a recomended approach here —
developers should be aware of any cyclic dependencies (primary-to-foreign
key relations on both directions for different columns between any two
tables, falling under query), as also as self-referable tables

Because of reasonable feature of database compatibility in
particular, and due to frequent needs of mapping XML to RDBMSs, it would be
good if developers would have a flexible abstraction level. So few
considerations appear hear:

To give a possibility of rejecting requests to tables, mapped to
non-critical XML values; only critical part of XML structure would be added
to response. This minimize storage loads and give an awesome administration
benefit due to minimization of required network traffic
2.

Because data consistency is a MUST feature for modeling entities threw
any binding tools – columns in certain tables, falling under transaction at
the moment of SQLXML related request, would be interpreted as mapped to
optional XML tags, in XML tree structure, and as a result would not be
disturbed by SELECT operations. Cells in non-transactional columns would be
requested in any case
3.

Because publishing XML services from RDBMS is not the only (and sometimes
– not the best) architecture solution. XML centric architecture, where
objects are persisted threw XML structures in XML database, populated from
RDBMS under XMLSchema driven data consistency rules, require replication of
XML database indexes and RDBMS indexes. So using indexes in Derby to serve
SQLXML requests is a MUST in such a case. Besides it would speed up SELECT
operations, would help improving data driven administration of resulting
software
4.

Situations, when cyclic dependencies appear, should force an exception. A
good idea is for design phase — to generate different exception classes,
which would be possible to adapt their resolving threw IDE, if needed

Such things like XPath requests from client side would affect
JDBC result set cursor class. There are some research results, investigated
by me in past, about parallelization of Xpath on multi-core servers. That
could help improve SQLXML client, speed up implementing JDBC 4 support in
future. Besides such databases like PostgreSQL, HyperSQL already have
implementations of JDBC 4 SQLXML, and can be a place for expriments.

*Activity provided*

I made a small research against implementations of JDBC 4
SQLXML in open source, Java based RDBMSs: HyperSQL, PostgreSQL, H2 (in
general). There is a ticket created (?) with attached researched results.
Also some UML of Derby JDBC classes, PostgreSQL and HyperSQL SQLXML classes
where produced by me. I have plans to use a hybrid UML & code generation IDE
to speed up development. Also two existing tickets (*
https://issues.apache.org/jira/browse/DERBY-334*<https://issues.apache.org/jira/browse/DERBY-334>,
*https://issues.apache.org/jira/browse/DERBY-1655*<https://issues.apache.org/jira/browse/DERBY-1655>),
targeted to existing implementations of XML formatting functions for SQL in
Derby, will help understanding consideration differences between approach in
this proposal and approach, used in existing (*
https://issues.apache.org/jira/browse/DERBY-334*<https://issues.apache.org/jira/browse/DERBY-334>
,*https://issues.apache.org/jira/browse/DERBY-688*<https://issues.apache.org/jira/browse/DERBY-688>)
implementations.

*Quality considerations*

To minise possible risks in this project, I plan to apply
unit/acceptance testing while coding, using XMLUnit framework. Here is a
list of necessary areas quality control:

Derby specific implementation testing for JDBC connector coding. Internal
impelemenation quality issue
2.

Acceptance tests for certain databases with JDBC4 SQLXML support:
HyperSQL, PostgerSQL. Can be taken from existing sources. Those would be
used to validate inter-database compatibility, using specific of certian
exisiting implementations; will show quality level against JDBC 4 standard
3.

Some advance tests (benchmarks), suitable for demostrating efficiancy of
network traffic decrease via use of optional Xml tags strategy
4.

Benchmarks for displaying various (2-3) strategy effectivity of handling
transactional isolation for XML tag customizations. Those make sense if, and
only if, JDBC requests are initiated at the time of commits/rollbacks

*Timeline *

*12.04*.* *Strating collecting and analyzing necessary JDBC
specifications, verifying reverse engeneering results (UML), discussing
architecture & necessary design patterns/approarche(s)

*15.05*. Begin summarizing necessary QA considerations &
starting coding unit tests, using XMLUnit

*24.05*. Starting coding main code. Providing some activities,
to enshure an acceptable quality level of tests

*30.05*. Active coding, with a development boost from unit
testing framework

*12.05*. Start preparing acceptance tests, both for Derby and
PostgreSQL, HyperSQL impelementations of SQLXML, to enshure necessary
database compatibility against JDBC standard

*20.06*. Start analyzing necessary areas of improvements, with
respect to left time and priority. Mostly improving support of transactions
and indexes would take place; some improvements for code to bring a facility
of zero administration for SQLXML in Derby

About me

I am John Titareno, a 22 year software developer from Kiev, Ukraine. I am
having a 5 year Web development experience, mostly in PHP, Java. A bit
DJango & TurboGears. Like MVC and ORM a lot. Also I hate Struts framework
because of their non-strict following of MVC paradigm in their
implementations. Also I do not like EJBs, but like Coldfusion a lot.
Interested in XMLBeans. Have experience in designing MySQL database schemas.
Also interested in areas of developing non-JDBC adapters (JNI, RESTful) for
accessing Java RDBMSs. My university (“Kiev Polytechnic Institute”)
education is related to developing software for embed devices, targeted to
high-precision measurements, in a precision-driven way; I have some
experience in mixed (analog + digital), as also as analog only circuit
simulation in appropriate engineering software. I was practicing PCB design
as part of my personal research in semiconductor based electronics (I am a
fan of all that servo motors and remote control devices; also have some
interests in more serious things, like advanced power adapters), with help
of my university tutor.
-----------------------------------------------------------------------
John