You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sqoop.apache.org by ab...@apache.org on 2014/10/10 05:07:25 UTC
[33/50] [abbrv] git commit: SQOOP-1481: SQOOP2: Document the public
apis and end-end design for the SQ2 Connector
SQOOP-1481: SQOOP2: Document the public apis and end-end design for the SQ2 Connector
(Abraham Elmahrek via Jarek Jarcec Cecho)
Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/b6000365
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/b6000365
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/b6000365
Branch: refs/heads/sqoop2
Commit: b6000365736d7b03124417f6b8731d8760011432
Parents: 4c964a9
Author: Jarek Jarcec Cecho <ja...@apache.org>
Authored: Thu Oct 9 08:16:52 2014 -0700
Committer: Abraham Elmahrek <ab...@elmahrek.com>
Committed: Thu Oct 9 18:22:51 2014 -0700
----------------------------------------------------------------------
docs/src/site/sphinx/CommandLineClient.rst | 6 +-
docs/src/site/sphinx/ConnectorDevelopment.rst | 154 ++++++++++---------
docs/src/site/sphinx/RESTAPI.rst | 54 +------
.../sqoop/connector/spi/SqoopConnector.java | 2 +-
4 files changed, 95 insertions(+), 121 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/sqoop/blob/b6000365/docs/src/site/sphinx/CommandLineClient.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/CommandLineClient.rst b/docs/src/site/sphinx/CommandLineClient.rst
index fb7f780..7cc6449 100644
--- a/docs/src/site/sphinx/CommandLineClient.rst
+++ b/docs/src/site/sphinx/CommandLineClient.rst
@@ -329,14 +329,14 @@ Create new job object.
+------------------------+------------------------------------------------------------------+
| Argument | Description |
+========================+==================================================================+
-| ``-x``, ``--xid <x>`` | Create new job object for connection with id ``<x>`` |
+| ``-f``, ``--from <x>`` | Create new job object with a FROM connection with id ``<x>`` |
+------------------------+------------------------------------------------------------------+
-| ``-t``, ``--type <t>`` | Create new job object with type ``<t>`` (``import``, ``export``) |
+| ``-t``, ``--to <t>`` | Create new job object with a TO connection with id ``<x>`` |
+------------------------+------------------------------------------------------------------+
Example: ::
- create job --xid 1
+ create job --from 1 --to 2
Update Command
--------------
http://git-wip-us.apache.org/repos/asf/sqoop/blob/b6000365/docs/src/site/sphinx/ConnectorDevelopment.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/ConnectorDevelopment.rst b/docs/src/site/sphinx/ConnectorDevelopment.rst
index ae4f721..d700e4c 100644
--- a/docs/src/site/sphinx/ConnectorDevelopment.rst
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@@ -26,17 +26,15 @@ using the code of built-in connector ( ``GenericJdbcConnector`` ) as example.
What is Connector?
++++++++++++++++++
-Connector provides interaction with external databases.
-Connector reads data from databases for import,
-and write data to databases for export.
-Interaction with Hadoop is taken cared by common modules of Sqoop 2 framework.
+The connector provides the facilities to interact with external data sources.
+The connector can read from, or write to, a data source.
When do we add a new connector?
===============================
You add a new connector when you need to extract data from a new data source, or load
data to a new target.
In addition to the connector API, Sqoop 2 also has an engine interface.
-At the moment the only engine is MapReduce,but we may support additional engines in the future.
+At the moment the only engine is MapReduce, but we may support additional engines in the future.
Since many parallel execution engines are capable of reading/writing data
there may be a question of whether support for specific data stores should be done
through a new connector or new engine.
@@ -51,57 +49,73 @@ Connector Implementation
The ``SqoopConnector`` class defines functionality
which must be provided by Connectors.
-Each Connector must extends ``SqoopConnector`` and overrides methods shown below.
+Each Connector must extend ``SqoopConnector`` and override the methods shown below.
::
public abstract String getVersion();
public abstract ResourceBundle getBundle(Locale locale);
- public abstract Class getConnectionConfigurationClass();
- public abstract Class getJobConfigurationClass(MJob.Type jobType);
- public abstract Importer getImporter();
- public abstract Exporter getExporter();
+ public abstract Class getLinkConfigurationClass();
+ public abstract Class getJobConfigurationClass(Direction direction);
+ public abstract From getFrom();
+ public abstract To getTo();
public abstract Validator getValidator();
public abstract MetadataUpgrader getMetadataUpgrader();
-The ``getImporter`` method returns Importer_ instance
-which is a placeholder for the modules needed for import.
+Connectors can optionally override the following methods:
+::
+
+ public List<Direction> getSupportedDirections();
+
-The ``getExporter`` method returns Exporter_ instance
-which is a placeholder for the modules needed for export.
+The ``getFrom`` method returns From_ instance
+which is a placeholder for the modules needed to read from a data source.
+
+The ``getTo`` method returns Exporter_ instance
+which is a placeholder for the modules needed to write to a data source.
Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
``getJobConfigurationClass`` and ``getValidator``
are concerned to `Connector configurations`_ .
+The ``getSupportedDirections`` method returns a list of directions
+that a connector supports. This should be some subset of:
+::
+
+ public List<Direction> getSupportedDirections() {
+ return Arrays.asList(new Direction[]{
+ Direction.FROM,
+ Direction.TO
+ });
+ }
+
-Importer
-========
+From
+====
-Connector's ``getImporter`` method returns ``Importer`` instance
-which is a placeholder for the modules needed for import
-such as Partitioner_ and Extractor_ .
-Built-in ``GenericJdbcConnector`` defines ``Importer`` like this.
+The connector's ``getFrom`` method returns ``From`` instance
+which is a placeholder for the modules needed for reading
+from a data source. Modules such as Partitioner_ and Extractor_ .
+The built-in ``GenericJdbcConnector`` defines ``From`` like this.
::
- private static final Importer IMPORTER = new Importer(
- GenericJdbcImportInitializer.class,
- GenericJdbcImportPartitioner.class,
- GenericJdbcImportExtractor.class,
- GenericJdbcImportDestroyer.class);
+ private static final From FROM = new From(
+ GenericJdbcFromInitializer.class,
+ GenericJdbcPartitioner.class,
+ GenericJdbcExtractor.class,
+ GenericJdbcFromDestroyer.class);
...
@Override
- public Importer getImporter() {
- return IMPORTER;
+ public From getFrom() {
+ return FROM;
}
Extractor
---------
-Extractor (E for ETL) extracts data from external database and
-writes it to Sqoop framework for import.
+Extractor (E for ETL) extracts data from external database.
Extractor must overrides ``extract`` method.
::
@@ -114,7 +128,13 @@ Extractor must overrides ``extract`` method.
The ``extract`` method extracts data from database in some way and
writes it to ``DataWriter`` (provided by context) as `Intermediate representation`_ .
-Extractor must iterates in the ``extract`` method until the data from database exhausts.
+Extractors use Writer's provided by the ExtractorContext to send a record through the
+framework.
+::
+
+ context.getDataWriter().writeArrayRecord(array);
+
+The extractor must iterate through the entire dataset in the ``extract`` method.
::
while (resultSet.next()) {
@@ -127,9 +147,9 @@ Extractor must iterates in the ``extract`` method until the data from database e
Partitioner
-----------
-Partitioner creates ``Partition`` instances based on configurations.
+The Partitioner creates ``Partition`` instances based on configurations.
The number of ``Partition`` instances is decided
-based on the value users specified as the numbers of ectractors
+based on the value users specified as the numbers of extractors
in job configuration.
``Partition`` instances are passed to Extractor_ as the argument of ``extract`` method.
@@ -157,35 +177,35 @@ for doing preparation such as adding dependent jar files.
Destroyer is instantiated after MapReduce job is finished for clean up.
-Exporter
-========
+To
+==
-Connector's ``getExporter`` method returns ``Exporter`` instance
-which is a placeholder for the modules needed for export
-such as Loader_ .
-Built-in ``GenericJdbcConnector`` defines ``Exporter`` like this.
+The Connector's ``getTo`` method returns a ``To`` instance
+which is a placeholder for the modules needed for writing
+to a data source such as Loader_ .
+The built-in ``GenericJdbcConnector`` defines ``To`` like this.
::
- private static final Exporter EXPORTER = new Exporter(
- GenericJdbcExportInitializer.class,
- GenericJdbcExportLoader.class,
- GenericJdbcExportDestroyer.class);
+ private static final To TO = new To(
+ GenericJdbcToInitializer.class,
+ GenericJdbcLoader.class,
+ GenericJdbcToDestroyer.class);
...
@Override
- public Exporter getExporter() {
- return EXPORTER;
+ public To getTo() {
+ return TO;
}
Loader
------
-Loader (L for ETL) receives data from Sqoop framework and
-loads it to external database.
+A loader (L for ETL) receives data from the Sqoop framework and
+loads it to an external database.
-Loader must overrides ``load`` method.
+Loaders must overrides ``load`` method.
::
public abstract void load(LoaderContext context,
@@ -195,7 +215,7 @@ Loader must overrides ``load`` method.
The ``load`` method reads data from ``DataReader`` (provided by context)
in `Intermediate representation`_ and loads it to database in some way.
-Loader must iterates in the ``load`` method until the data from ``DataReader`` exhausts.
+Loader must iterate in the ``load`` method until the data from ``DataReader`` is exhausted.
::
while ((array = context.getDataReader().readArrayRecord()) != null) {
@@ -218,7 +238,7 @@ Connector Configurations
Connector specifications
========================
-Framework of the Sqoop loads definitions of connectors
+Sqoop loads definitions of connectors
from the file named ``sqoopconnector.properties``
which each connector implementation provides.
::
@@ -231,7 +251,7 @@ which each connector implementation provides.
Configurations
==============
-Implementation of ``SqoopConnector`` overrides methods such as
+Implementations of ``SqoopConnector`` overrides methods such as
``getConnectionConfigurationClass`` and ``getJobConfigurationClass``
returning configuration class.
::
@@ -242,12 +262,12 @@ returning configuration class.
}
@Override
- public Class getJobConfigurationClass(MJob.Type jobType) {
- switch (jobType) {
- case IMPORT:
- return ImportJobConfiguration.class;
- case EXPORT:
- return ExportJobConfiguration.class;
+ public Class getJobConfigurationClass(Direction direction) {
+ switch (direction) {
+ case FROM:
+ return FromJobConfiguration.class;
+ case TO:
+ return ToJobConfiguration.class;
default:
return null;
}
@@ -260,7 +280,7 @@ Annotations such as
are provided for defining configurations of each connectors
using these models.
-``ConfigurationClass`` is place holder for ``FormClasses`` .
+``ConfigurationClass`` is a place holder for ``FormClasses`` .
::
@ConfigurationClass
@@ -323,16 +343,12 @@ Validator validates configurations set by users.
Internal of Sqoop2 MapReduce Job
++++++++++++++++++++++++++++++++
-Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and ``SqoopReducer``
-for the both of import and export.
-
-- For import, ``Extractor`` provided by connector extracts data from databases,
- and ``Loader`` provided by Sqoop2 loads data into Hadoop.
+Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and ``SqoopReducer``.
-- For export, ``Extractor`` provided by Sqoop2 exracts data from Hadoop,
- and ``Loader`` provided by connector loads data into databases.
+When reading from a data source, the ``Extractor`` provided by the FROM connector extracts data from a database,
+and the ``Loader``, provided by the TO connector, loads data into another database.
-The diagram below describes the initialization phase of IMPORT job.
+The diagram below describes the initialization phase of a job.
``SqoopInputFormat`` create splits using ``Partitioner`` .
::
@@ -351,8 +367,8 @@ The diagram below describes the initialization phase of IMPORT job.
|-------------------------------------------------->|SqoopSplit|
| | | `----+-----'
-The diagram below describes the map phase of IMPORT job.
-``SqoopMapper`` invokes extractor's ``extract`` method.
+The diagram below describes the map phase of a job.
+``SqoopMapper`` invokes FROM connector's extractor's ``extract`` method.
::
,-----------.
@@ -378,8 +394,8 @@ The diagram below describes the map phase of IMPORT job.
| | | context.write
| | |-------------------------->
-The diagram below decribes the reduce phase of EXPORT job.
-``OutputFormat`` invokes loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor`` ).
+The diagram below decribes the reduce phase of a job.
+``OutputFormat`` invokes TO connector's loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor`` ).
::
,-------. ,---------------------.
http://git-wip-us.apache.org/repos/asf/sqoop/blob/b6000365/docs/src/site/sphinx/RESTAPI.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/RESTAPI.rst b/docs/src/site/sphinx/RESTAPI.rst
index 04908e8..0117cb6 100644
--- a/docs/src/site/sphinx/RESTAPI.rst
+++ b/docs/src/site/sphinx/RESTAPI.rst
@@ -418,7 +418,7 @@ So far, the resource contains only explanations for fields of forms. For example
"name":"generic-jdbc-connector",
"class":"org.apache.sqoop.connector.jdbc.GenericJdbcConnector",
"job-forms":{
- "IMPORT":[
+ "FROM":[
{
"id":2,
"inputs":[
@@ -475,7 +475,7 @@ So far, the resource contains only explanations for fields of forms. For example
"type":"CONNECTION"
}
],
- "EXPORT":[
+ "TO":[
{
"id":3,
"inputs":[
@@ -624,35 +624,7 @@ the id of the form to track these parameter inputs.
},
"framework-version":"1",
"job-forms":{
- "IMPORT":[
- {
- "id":5,
- "inputs":[
- {
- "id":18,
- "values":"HDFS",
- "name":"output.storageType",
- "type":"ENUM",
- "sensitive":false
- },
- {
- "id":19,
- "values":"TEXT_FILE,SEQUENCE_FILE",
- "name":"output.outputFormat",
- "type":"ENUM",
- "sensitive":false
- },
- {
- "id":20,
- "name":"output.outputDirectory",
- "type":"STRING",
- "size":255,
- "sensitive":false
- }
- ],
- "name":"output",
- "type":"CONNECTION"
- },
+ "FROM":[
{
"id":6,
"inputs":[
@@ -673,23 +645,9 @@ the id of the form to track these parameter inputs.
"type":"CONNECTION"
}
],
- "EXPORT":[
+ "TO":[
{
- "id":7,
- "inputs":[
- {
- "id":23,
- "name":"input.inputDirectory",
- "type":"STRING",
- "size":255,
- "sensitive":false
- }
- ],
- "name":"input",
- "type":"CONNECTION"
- },
- {
- "id":8,
+ "id":1,
"inputs":[
{
"id":24,
@@ -1272,7 +1230,7 @@ Provide the id of the job in the url [jid] part. If you provide ``all`` in the [
}
],
"connector-id":1,
- "type":"IMPORT",
+ "type":"FROM",
"framework":[
{
"id":5,
http://git-wip-us.apache.org/repos/asf/sqoop/blob/b6000365/spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
----------------------------------------------------------------------
diff --git a/spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java b/spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
index 849f091..4272470 100644
--- a/spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
+++ b/spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
@@ -65,7 +65,7 @@ public abstract class SqoopConnector {
/**
* @return Get job configuration group class per direction type or null if not supported
*/
- public abstract Class getJobConfigurationClass(Direction jobType);
+ public abstract Class getJobConfigurationClass(Direction direction);
/**
* @return an <tt>From</tt> that provides classes for performing import.