You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sqoop.apache.org by ja...@apache.org on 2014/12/13 16:56:46 UTC

sqoop git commit: SQOOP-1757: Sqoop2: Document generic jdbc connector

Repository: sqoop
Updated Branches:
  refs/heads/sqoop2 293e9ef63 -> ee097891b


SQOOP-1757: Sqoop2: Document generic jdbc connector

(Abraham Elmahrek via Jarek Jarcec Cecho)


Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/ee097891
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/ee097891
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/ee097891

Branch: refs/heads/sqoop2
Commit: ee097891b727acc6dd0b1f6222d7758e436f71c9
Parents: 293e9ef
Author: Jarek Jarcec Cecho <ja...@apache.org>
Authored: Sat Dec 13 07:55:31 2014 -0800
Committer: Jarek Jarcec Cecho <ja...@apache.org>
Committed: Sat Dec 13 07:55:31 2014 -0800

----------------------------------------------------------------------
 docs/src/site/sphinx/Connectors.rst | 199 +++++++++++++++++++++++++++++++
 docs/src/site/sphinx/index.rst      |   1 +
 2 files changed, 200 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/sqoop/blob/ee097891/docs/src/site/sphinx/Connectors.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/Connectors.rst b/docs/src/site/sphinx/Connectors.rst
new file mode 100644
index 0000000..bcc5b43
--- /dev/null
+++ b/docs/src/site/sphinx/Connectors.rst
@@ -0,0 +1,199 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+
+==================
+Sqoop 2 Connectors
+==================
+
+This document describes how to use the built-in connectors. This includes a detailed description of how connectors partition, format their output, extract data, and load data.
+
+.. contents::
+   :depth: 3
+
+++++++++++++++++++++++
+Generic JDBC Connector
+++++++++++++++++++++++
+
+The Generic JDBC Connector can connect to any data source that adheres to the **JDBC 4** specification.
+
+-----
+Usage
+-----
+
+To use the Generic JDBC Connector, create a link for the connector and a job that uses the link.
+
+**Link Configuration**
+++++++++++++++++++++++
+
+Inputs associated with the link configuration include:
+
++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
+| Input                       | Type    | Description                                                           | Example                                  |
++=============================+=========+=======================================================================+==========================================+
+| JDBC Driver Class           | String  | The full class name of the JDBC driver.                               | com.mysql.jdbc.Driver                    |
+|                             |         | *Required* and accessible by the Sqoop server.                        |                                          |
++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
+| JDBC Connection String      | String  | The JDBC connection string to use when connecting to the data source. | jdbc:mysql://localhost/test              |
+|                             |         | *Required*. Connectivity upon creation is optional.                   |                                          |
++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
+| Username                    | String  | The username to provide when connecting to the data source.           | sqoop                                    |
+|                             |         | *Optional*. Connectivity upon creation is optional.                   |                                          |
++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
+| Password                    | String  | The password to provide when connecting to the data source.           | sqoop                                    |
+|                             |         | *Optional*. Connectivity upon creation is optional.                   |                                          |
++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
+| JDBC Connection Properties  | Map     | A map of JDBC connection properties to pass to the JDBC driver        | profileSQL=true&useFastDateParsing=false |
+|                             |         | *Optional*.                                                           |                                          |
++-----------------------------+---------------------------------------------------------------------------------+------------------------------------------+
+
+**FROM Job Configuration**
+++++++++++++++++++++++++++
+
+Inputs associated with the Job configuration for the FROM direction include:
+
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Input                       | Type    | Description                                                             | Example                                     |
++=============================+=========+=========================================================================+=============================================+
+| Schema name                 | String  | The schema name the table is part of.                                   | sqoop                                       |
+|                             |         | *Optional*                                                              |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Table name                  | String  | The table name to import data from.                                     | test                                        |
+|                             |         | *Optional*. See note below.                                             |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Table SQL statement         | String  | The SQL statement used to perform a **free form query**.                | ``SELECT COUNT(*) FROM test ${CONDITIONS}`` |
+|                             |         | *Optional*. See note below.                                             |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Table column names          | String  | Columns to extract from the JDBC data source.                           | col1,col2                                   |
+|                             |         | *Optional* Comma separated list of columns.                             |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Partition column name       | Map     | The column name used to partition the data transfer process.            | col1                                        |
+|                             |         | *Optional*.  Defaults to primary key of table.                          |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Null value allowed for      | Boolean | True or false depending on whether NULL values are allowed in data      | true                                        |
+| the partition column        |         | of the Partition column. *Optional*.                                    |                                             |
++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
+| Boundary query              | String  | The query used to define an upper and lower boundary when partitioning. |                                             |
+|                             |         | *Optional*.                                                             |                                             |
++-----------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+
+
+**Notes**
+=========
+
+1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided.
+2. *Table column names* should be provided only if *Table name* is provided.
+
+**TO Job Configuration**
+++++++++++++++++++++++++
+
+Inputs associated with the Job configuration for the TO direction include:
+
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Input                       | Type    | Description                                                             | Example                                         |
++=============================+=========+=========================================================================+=================================================+
+| Schema name                 | String  | The schema name the table is part of.                                   | sqoop                                           |
+|                             |         | *Optional*                                                              |                                                 |
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Table name                  | String  | The table name to import data from.                                     | test                                            |
+|                             |         | *Optional*. See note below.                                             |                                                 |
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Table SQL statement         | String  | The SQL statement used to perform a **free form query**.                | ``INSERT INTO test (col1, col2) VALUES (?, ?)`` |
+|                             |         | *Optional*. See note below.                                             |                                                 |
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Table column names          | String  | Columns to insert into the JDBC data source.                            | col1,col2                                       |
+|                             |         | *Optional* Comma separated list of columns.                             |                                                 |
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Stage table name            | String  | The name of the table used as a *staging table*.                        | staging                                         |
+|                             |         | *Optional*.                                                             |                                                 |
++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
+| Should clear stage table    | Boolean | True or false depending on whether the staging table should be cleared  | true                                            |
+|                             |         | after the data transfer has finished. *Optional*.                       |                                                 |
++-----------------------------+-----------------------------------------------------------------------------------+-------------------------------------------------+
+
+**Notes**
+=========
+
+1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided.
+2. *Table column names* should be provided only if *Table name* is provided.
+
+-----------
+Partitioner
+-----------
+
+The Generic JDBC Connector partitioner generates conditions to be used by the extractor.
+It varies in how it partitions data transfer based on the partition column data type.
+Though, each strategy roughly takes on the following form:
+::
+
+  (upper boundary - lower boundary) / (max partitions)
+
+By default, the *primary key* will be used to partition the data unless otherwise specified.
+
+The following data types are currently supported:
+
+1. TINYINT
+2. SMALLINT
+3. INTEGER
+4. BIGINT
+5. REAL
+6. FLOAT
+7. DOUBLE
+8. NUMERIC
+9. DECIMAL
+10. BIT
+11. BOOLEAN
+12. DATE
+13. TIME
+14. TIMESTAMP
+15. CHAR
+16. VARCHAR
+17. LONGVARCHAR
+
+---------
+Extractor
+---------
+
+During the *extraction* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.
+
+- If *Table name* is provided, then the SQL statement generated will take on the form ``SELECT * FROM <table name>``.
+- If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``SELECT <columns> FROM <table name>``.
+- If *Table SQL statement* is provided, then the provided SQL statement will be used.
+
+The conditions generated by the *partitioner* are appended to the end of the SQL query to query a section of data.
+
+The Generic JDBC connector extracts CSV data usable by the *CSV Intermediate Data Format*.
+
+------
+Loader
+------
+
+During the *loading* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.
+
+- If *Table name* is provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (col1, col2, ...) VALUES (?,?,..)``.
+- If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (<columns>) VALUES (?,?,..)``.
+- If *Table SQL statement* is provided, then the provided SQL statement will be used.
+
+This connector expects to receive CSV data consumable by the *CSV Intermediate Data Format*.
+
+----------
+Destroyers
+----------
+
+The Generic JDBC Connector performs two operations in the destroyer in the TO direction:
+
+1. Copy the contents of the staging table to the desired table.
+2. Clear the staging table.
+
+No operations are performed in the FROM direction.

http://git-wip-us.apache.org/repos/asf/sqoop/blob/ee097891/docs/src/site/sphinx/index.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/index.rst b/docs/src/site/sphinx/index.rst
index 8257858..9c95d08 100644
--- a/docs/src/site/sphinx/index.rst
+++ b/docs/src/site/sphinx/index.rst
@@ -47,6 +47,7 @@ If you are excited to start using Sqoop you can follow the links below to get a
 
 - `Sqoop 5 Minute Demo <Sqoop5MinutesDemo.html>`_
 - `Command Line Shell Usage Guide <CommandLineClient.html>`_
+- `Connectors <Connectors.html>`_
 
 Developer Guide
 -----------------