You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2006/06/22 04:49:12 UTC
[Db-derby Wiki] Update of "MysqlDerbyMigration/DesignDocument" by RaminMoazeni

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by RaminMoazeni:
http://wiki.apache.org/db-derby/MysqlDerbyMigration/DesignDocument

New page:
= MySQL to Apache Derby Migration Tool =

== Ramin Moazeni ==

== Version 1.1 ==

  
= Introduction =

When you work with databases, you will often need to copy data from one database to another. The Apache Derby database currently is lacking a migration tool that that enables you to quickly migrate popular databases and data to Apache Derby database. Using a Graphical User Interface (GUI), the Derby migration tool will assist in successfully migrating databases. The use of Derby migration tool will assist in migration of databases including schema objects such as tables, indexes, views, and stored procedures to Apache Derby database. The advantage of using the migration tool will be Cost Saving, improved productivity, and reduced risks. 

I propose to implement the migration tool that will support migration of tables, views, indexes, and stored procedures. The steps required to migrate from MySQL to Apache Derby generally involves: 

 * Migration of the schema and data 
 * Migration of database objects such as tables, views, indexes, and stored procedures
 
This paper serves as a proposal for implementing the MySQL to Derby migration tool. The implementation language for this project is Java since it can be run on different platforms.  

= The Migration Process =
The MySQL to Apache Derby Migration tool is a framework that enables migration and conversion of data from MySQL database to Apache Derby database. The system consists of several components responsible for capturing source database schema, database object mapping, datatype mapping and capturing source data to migrate a MySQL database to Apache Derby. 

This following diagram describes processes used by the system to implement its capabilities. It explains how the system interacts with its actors (i.e users and databases) and the specific steps and interactions will the users have with the system. 

http://www-scf.usc.edu/~moazeni/GSoC/mysql_7.jpg

The above processes, tasks and interactions are summarized as the following use cases:

 * UC-1: Select Source DBMS    
   The user will select the source DBMS (i.e mysql, oracle, etc)

 * UC-2: Enter Source DBMS Connection Parameters 
   To enter information of the source Database Management System such as database IP, Port
   information, username and password. 

 * UC-3: Enter Target DBMS Connection Parameters 
   To enter information of the target Database Management System such as database IP, 
   Port information, username and password. 

 * UC-4: Select DB to be Migrated 
   The migration tool will capture the databases of the source DBMS. The user can select
   the database that needs to be migrated. 

 * UC-5: Choose DB Objects to be Migrated 
   The migration tool will capture the objects (tables, views, stored procedures, etc)
   from the selected database. The user can then select the objects that need to be
   migrated. 

 * UC-6: Edit mapping errors 
   The user will be displayed a list of object mapping errors. The user will get the
   option of manually editing the sql statements to resolve the issue.

 * UC-7: Database Object Mapping 
   Upon capturing the database schema, the system performs object mapping based on the
   target database. 

 * UC-8: Capture Source Database Data 
   The system will capture data from source database. 

 * UC-9: Data Conversion 
   The system will convert and map data from the source database to the target database by
   transforming SQL statements. 

 * UC-10: Data Migration 
   The system will capture and migrate data from the source database to the target
   database. Based on the list of tables captured from the source database (UC-14), the
   migration tool captures the data in the tables, and re-creates them in the target 
   database. 

 * UC-11: Error Handling 
   To handle any errors during the migration process such as errors in connecting to the
   source or target DBMS, errors in connecting to the database account and  mapping errors
   which includes, object mapping errors, lack of support for specific object and data
   type mapping errors. 

 * UC-12: Summary Report 
   The system will create a summary report after migration is complete. The summary report
   will include details about the objects that were migrated, and successful or
   unsuccessful migration results. 

 * UC-13: Establish Connection 
   The system will establish connection with the Source or Target DBMS. 

 * UC-14: Capture Source Database Schema 
   The system will capture schema from the source database. This will include listing of
   tables, views, indexes and stored procedures. 

 * UC-15: Schema Migration 
   Having the list of database objects in hand (UC-14), the migration tool will then  
   captures the structure of each object, provide necessary mappings(UC-7) based on the
   target database and then re-create the database schema on the target database.  

The Migration tool will be designed to migrate the following objects:

 * Tables 
 * Views 
 * Indexes 
 * Stored Procedures 

The Migration tool will not handle migration of the following objects:

 * Functions 
 * Triggers 

= High Level Design =
The system will be implemented in Java language using Java classes and interfaces. The following class diagram exhibits the relationship among classes and interfaces of the MySQL to Derby Migration tool: 

http://www-scf.usc.edu/~moazeni/GSoC/mysql_2.gif

== Approach 1: Using DatabaseMetaData ==

=== Schema Migration ===

A schema is a collection of components and database objects under the control of a given database user. The task for schema migration is to capture the schema and its components and database objects and then proceed with migration. 

To migrate a database you must obtain the catalog information about the database as well as list of the tables in the database and a list of all columns within each table. That information lets you create the SQL statements necessary to migrate the data. To do this, Java Database Connectivity (JDBC) can be used. Most databases provide a system table that will give you a list of the tables. However because the name of the system table varies from one database to another, the most compatible way to list and capture information about database objects is to use the Java class DatabaseMetaData. This class provides information about the database, such as the names of the tables and column information, names of views, procedure information, etc. 

The DatabaseMetaData class has many properties and methods. For example, for obtaining the table information, the getTables() method can be used to return the list of tables in ResultSet format. After obtaining the list of tables, a list of the columns in that table should be obtained. Doing that lets you create a proper CREATE TABLE statement, as well as INSERT and SELECT statements to match the current table. The process of obtaining a list of columns is through the use of getColumns() method.  After capturing the database catalog information (tables and columns), the required SQL statements can be generated. The same process can be used for capturing and recreating views, indexes, and procedures except that a different method of DatabaseMetaData may be used to capture the information. 

Any data type mappings needs to be done when re-creating the SQL statements as per 
table 1. For example the following SQL statement from MySQL:
  
    || CREATE TABLE EXAMPLE ( id       TINYINT); ||
    
    Needs to be changed to
    
    || CREATE TABLE EXAMPLE ( id       SMALLINT); ||

Considering the syntax of creating a view, we can see that having a query is necessary to create a view. Therefore, in addition to data types that needs to be mapped, join statements also need to be considered when creating views. The problem is that some MySQL join syntax are not supported by Apache Derby. The following are examples of join operations and their compatible syntax in Apache Derby: 
 
CROSS JOIN 
|| Derby || SELECT * FROM tab1 CROSS JOIN tab2 ||
|| MySQL || SELECT * FROM tab1, tab2 ||

INNER JOIN
|| Derby || SELECT * FROM tab1 INNER JOIN tab2 USING(id) ||
|| MySQL || SELECT * FROM tab1 INNER JOIN tab2 ON tab1.id=tab2.id ||

LEFT OUTER JOIN 
|| Derby || SELECT * FROM tab1 LEFT OUTER JOIN tab2 USING (id) ||
|| MySQL || SELECT * FROM tab1 LEFT OUTER JOIN tab2 ON tab1.id = tab2.id ||
 
RIGHT OUTER JOIN 
|| Derby || SELECT * FROM tab1 RIGHT OUTER JOIN tab2 USING (id) ||
|| MySQL || SELECT * FROM tab1 RIGHT OUTER JOIN tab2 ON tab1.id = tab2.id ||
 
In addition to join syntax, the reference manuals of both MySQL and Derby needs to be consulted for proper syntax during migration to Apache Derby. 

=== Data Migration ===

In this section, the process of data migration from MySQL to Apache Derby database will be discussed. However, before doing this, there is a need to find out the differences between the MySQL and Apache Derby structure. The table below shows the compatibility between built in data types. Some data types have a one to one correspondence while others require more consideration before migrating. The tables include information on the following: 

 * Numeric Types 
 * Date and Time Types 
 * String Types 

===== Numeric Types: =====

http://www-scf.usc.edu/~moazeni/GSoC/numeric.jpg

===== Date and Time Types: =====

http://www-scf.usc.edu/~moazeni/GSoC/datetime.jpg

===== String Types: =====

http://www-scf.usc.edu/~moazeni/GSoC/string.jpg

Table 1: Data type Mapping from MySQL to Apache Derby


To manipulate and migrate the data, Apache Derby supports INSERT, UPDATE and DELETE statements with similar syntax and behavior as MySQL. These statements can be used programmatically to migrate the data  

This step is usually done after schema migration most importantly when tables are migrated. To do that, both SELECT and INSERT statements must be created. The SELECT statement will read the table data from the source database. The INSERT statement will write the data to the target database. 

The task involves retrieving columns information from source database and re-creates the SELECT and INSERT statements. The SELECT statement can then be executed and a list of records will be returned. Using the returned list of records, the INSERT statement can be created by inserting the individual column data and then executing the INSERT statement. Any data type mappings needs to be done when re-creating the INSERT statement as per the above tables. 

== Approach 2: Using DdlUtils ==
 
Another approach would be to use DdlUtils. DdlUtils is a component for working with Database Definition (DDL) files. DDL files are XML files that contain the definition of a database schema, e.g. tables and columns. These files can also be fed into DdlUtils in order to create the corresponding database. An example of such a DDL file is: 

{{{ <?xml version="1.0"?>
<!DOCTYPE database SYSTEM "http://db.apache.org/torque/dtd/database.dtd">
  <table name="book">
    <column name="book_id"
            type="INTEGER"
            required="true"
            primaryKey="true"
            autoIncrement="true"/>
    <column name="isbn"
            type="VARCHAR"
            size="15"
            required="true"/>
    <column name="author_id"
            type="INTEGER"
            required="true"/>
    <column name="title"
            type="VARCHAR"
            size="255"
            defaultValue="N/A"
            required="true"/>
    <foreign-key foreignTable="author">
      <reference local="author_id" foreign="author_id"/>
    </foreign-key>  
    <index name="book_isbn">
      <index-column name="isbn"/>
    </index>
  </table>
</database> }}}


In order to use DdlUtils component to migrate the source database to the target database, following steps are suggested: 

 * Capture DDL file from DdlUtils. 
 * Modify the source DDL to the target DDL using Serialization and Deserialization of XML. 
 * Fed in the new DDL file to the DdlUtils. 

In order to easily work with XML files, the use of Serialization and Deserialization of XML is suggested. XML files can be deserialized to Java objects. Therefore, we are dealing with objects afterwards. After modifying the value of object attributes based on datatype mappings of table 1, objects can be serialized to XML files and be used by the DdlUtils to create the corresponding database.

Please note that the DdlUtils doesn’t support non-table entities, constraints, triggers, stored procedures, etc. A possible solution would be to execute database-specific commands to capture required information.

= Initial Prototypes =

The migration tool will provide both graphical user interface as well as command line interface. I propose the following way of performing activities using the Graphical User Interface that the system provides:

===== Welcome Screen: =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_1.jpg

===== Selecting Source and Target Database: =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_2.jpg

===== Source Database Schema Selection =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_3.jpg

===== Select Objects to be Migrated: =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_4.jpg

===== Object Mapping Errors =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_5.jpg

===== Migration Complete =====
http://www-scf.usc.edu/~moazeni/GSoC/mysql_8.jpg