You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sqoop.apache.org by ja...@apache.org on 2014/06/23 17:34:43 UTC
git commit: SQOOP-1337: Doc refactoring - Consolidate documentation
of --direct
Repository: sqoop
Updated Branches:
refs/heads/trunk d902d2449 -> c320b4fe0
SQOOP-1337: Doc refactoring - Consolidate documentation of --direct
(Gwen Shapira via Jarek Jarcec Cecho)
Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/c320b4fe
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/c320b4fe
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/c320b4fe
Branch: refs/heads/trunk
Commit: c320b4fe03e3ca7e16b30f382c03cef7d047d616
Parents: d902d24
Author: Jarek Jarcec Cecho <ja...@apache.org>
Authored: Mon Jun 23 08:34:03 2014 -0700
Committer: Jarek Jarcec Cecho <ja...@apache.org>
Committed: Mon Jun 23 08:34:03 2014 -0700
----------------------------------------------------------------------
src/docs/user/compatibility.txt | 37 ----------
src/docs/user/connectors.txt | 123 +++++++++++++++++++++++++++----
src/docs/user/export.txt | 16 ++--
src/docs/user/import-all-tables.txt | 3 -
src/docs/user/import.txt | 25 +------
5 files changed, 120 insertions(+), 84 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/sqoop/blob/c320b4fe/src/docs/user/compatibility.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/compatibility.txt b/src/docs/user/compatibility.txt
index 37e07b2..a7344e7 100644
--- a/src/docs/user/compatibility.txt
+++ b/src/docs/user/compatibility.txt
@@ -1,4 +1,3 @@
-
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
@@ -127,42 +126,6 @@ Sqoop is currently not supporting import from view in direct mode. Use
JDBC based (non direct) mode in case that you need to import view (simply
omit +--direct+ parameter).
-Direct-mode Transactions
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-For performance, each writer will commit the current transaction
-approximately every 32 MB of exported data. You can control this
-by specifying the following argument _before_ any tool-specific arguments: +-D
-sqoop.mysql.export.checkpoint.bytes=size+, where _size_ is a value in
-bytes. Set _size_ to 0 to disable intermediate checkpoints,
-but individual files being exported will continue to be committed
-independently of one another.
-
-Sometimes you need to export large data with Sqoop to a live MySQL cluster that
-is under a high load serving random queries from the users of your application.
-While data consistency issues during the export can be easily solved with a
-staging table, there is still a problem with the performance impact caused by
-the heavy export.
-
-First off, the resources of MySQL dedicated to the import process can affect
-the performance of the live product, both on the master and on the slaves.
-Second, even if the servers can handle the import with no significant
-performance impact (mysqlimport should be relatively "cheap"), importing big
-tables can cause serious replication lag in the cluster risking data
-inconsistency.
-
-With +-D sqoop.mysql.export.sleep.ms=time+, where _time_ is a value in
-milliseconds, you can let the server relax between checkpoints and the replicas
-catch up by pausing the export process after transferring the number of bytes
-specified in +sqoop.mysql.export.checkpoint.bytes+. Experiment with different
-settings of these two parameters to archieve an export pace that doesn't
-endanger the stability of your MySQL cluster.
-
-IMPORTANT: Note that any arguments to Sqoop that are of the form +-D
-parameter=value+ are Hadoop _generic arguments_ and must appear before
-any tool-specific arguments (for example, +\--connect+, +\--table+, etc).
-Don't forget that these parameters are only supported with the +\--direct+
-flag set.
PostgreSQL
~~~~~~~~~~
http://git-wip-us.apache.org/repos/asf/sqoop/blob/c320b4fe/src/docs/user/connectors.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/connectors.txt b/src/docs/user/connectors.txt
index cf66112..379cbd9 100644
--- a/src/docs/user/connectors.txt
+++ b/src/docs/user/connectors.txt
@@ -1,4 +1,3 @@
-
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
@@ -17,7 +16,7 @@
limitations under the License.
////
-
+[[connectors]]
Notes for specific connectors
-----------------------------
@@ -39,6 +38,80 @@ it will update appropriate row instead. As a result, Sqoop is ignoring values sp
in parameter +\--update-key+, however user needs to specify at least one valid column
to turn on update mode itself.
+
+MySQL Direct Connector
+~~~~~~~~~~~~~~~~~~~~~~
+
+MySQL Direct Connector allows faster import and export to/from MySQL using +mysqldump+ and +mysqlimport+ tools functionality
+instead of SQL selects and inserts.
+
+To use the MySQL Direct Connector, specify the +\--direct+ argument for your import or export job.
+
+Example:
+
+----
+$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
+ --direct
+----
+
+Passing additional parameters to mysqldump:
+
+----
+$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
+ --direct -- --default-character-set=latin1
+----
+
+Requirements
+^^^^^^^^^^^^
+
+Utilities +mysqldump+ and +mysqlimport+ should be present in the shell path of the user running the Sqoop command on
+all nodes. To validate SSH as this user to all nodes and execute these commands. If you get an error, so will Sqoop.
+
+Limitations
+^^^^^^^^^^^^
+
+* Currently the direct connector does not support import of large object columns (BLOB and CLOB).
+* Importing to HBase and Accumulo is not supported
+* Use of a staging table when exporting data is not supported
+* Import of views is not supported
+
+Direct-mode Transactions
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For performance, each writer will commit the current transaction
+approximately every 32 MB of exported data. You can control this
+by specifying the following argument _before_ any tool-specific arguments: +-D
+sqoop.mysql.export.checkpoint.bytes=size+, where _size_ is a value in
+bytes. Set _size_ to 0 to disable intermediate checkpoints,
+but individual files being exported will continue to be committed
+independently of one another.
+
+Sometimes you need to export large data with Sqoop to a live MySQL cluster that
+is under a high load serving random queries from the users of your application.
+While data consistency issues during the export can be easily solved with a
+staging table, there is still a problem with the performance impact caused by
+the heavy export.
+
+First off, the resources of MySQL dedicated to the import process can affect
+the performance of the live product, both on the master and on the slaves.
+Second, even if the servers can handle the import with no significant
+performance impact (mysqlimport should be relatively "cheap"), importing big
+tables can cause serious replication lag in the cluster risking data
+inconsistency.
+
+With +-D sqoop.mysql.export.sleep.ms=time+, where _time_ is a value in
+milliseconds, you can let the server relax between checkpoints and the replicas
+catch up by pausing the export process after transferring the number of bytes
+specified in +sqoop.mysql.export.checkpoint.bytes+. Experiment with different
+settings of these two parameters to archieve an export pace that doesn't
+endanger the stability of your MySQL cluster.
+
+IMPORTANT: Note that any arguments to Sqoop that are of the form +-D
+parameter=value+ are Hadoop _generic arguments_ and must appear before
+any tool-specific arguments (for example, +\--connect+, +\--table+, etc).
+Don't forget that these parameters are only supported with the +\--direct+
+flag set.
+
Microsoft SQL Connector
~~~~~~~~~~~~~~~~~~~~~~~
@@ -60,6 +133,7 @@ Argument Description
Schema support
^^^^^^^^^^^^^^
+
If you need to work with tables that are located in non-default schemas, you can
specify schema names via the +\--schema+ argument. Custom schemas are supported for
both import and export jobs. For example:
@@ -98,8 +172,31 @@ Argument Description
Default is "public".
---------------------------------------------------------------------------------
-The direct connector (used when specified +\--direct+ parameter), offers also
-additional extra arguments:
+Schema support
+^^^^^^^^^^^^^^
+
+If you need to work with table that is located in schema other than default one,
+you need to specify extra argument +\--schema+. Custom schemas are supported for
+both import and export job (optional staging table however must be present in the
+same schema as target table). Example invocation:
+
+----
+$ sqoop import ... --table custom_table -- --schema custom_schema
+----
+
+PostgreSQL Direct Connector
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PostgreSQL Direct Connector allows faster import and export to/from PostgresSQL "COPY" command.
+
+To use the PostgreSQL Direct Connector, specify the +\--direct+ argument for your import or export job.
+
+When importing from PostgreSQL in conjunction with direct mode, you
+can split the import into separate files after
+individual files reach a certain size. This size limit is controlled
+with the +\--direct-split-size+ argument.
+
+The direct connector offers also additional extra arguments:
.Additional supported PostgreSQL extra arguments in direct mode:
[grid="all"]
@@ -114,19 +211,19 @@ Argument Description
Default is "FALSE".
---------------------------------------------------------------------------------
-Schema support
-^^^^^^^^^^^^^^
+Requirements
+^^^^^^^^^^^^
-If you need to work with table that is located in schema other than default one,
-you need to specify extra argument +\--schema+. Custom schemas are supported for
-both import and export job (optional staging table however must be present in the
-same schema as target table). Example invocation:
+Utility +psql+ should be present in the shell path of the user running the Sqoop command on
+all nodes. To validate SSH as this user to all nodes and execute these commands. If you get an error, so will Sqoop.
-----
-$ sqoop import ... --table custom_table -- --schema custom_schema
-----
+Limitations
+^^^^^^^^^^^^
+* Currently the direct connector does not support import of large object columns (BLOB and CLOB).
+* Importing to HBase and Accumulo is not supported
+* Import of views is not supported
pg_bulkload connector
~~~~~~~~~~~~~~~~~~~~~
http://git-wip-us.apache.org/repos/asf/sqoop/blob/c320b4fe/src/docs/user/export.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/export.txt b/src/docs/user/export.txt
index 8b9e473..304810a 100644
--- a/src/docs/user/export.txt
+++ b/src/docs/user/export.txt
@@ -1,4 +1,3 @@
-
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
@@ -93,13 +92,10 @@ additional load may decrease performance. The +\--num-mappers+ or +-m+
arguments control the number of map tasks, which is the degree of
parallelism used.
-MySQL provides a direct mode for exports as well, using the
-+mysqlimport+ tool. When exporting to MySQL, use the +\--direct+ argument
-to specify this codepath. This may be
-higher-performance than the standard JDBC codepath.
-
-NOTE: When using export in direct mode with MySQL, the MySQL bulk utility
-+mysqlimport+ must be available in the shell path of the task process.
+Some databases provides a direct mode for exports as well. Use the +\--direct+ argument
+to specify this codepath. This may be higher-performance than the standard JDBC codepath.
+Details about use of direct mode with each specific RDBMS, installation requirements, available
+options and limitations can be found in <<connectors>>.
The +\--input-null-string+ and +\--input-null-non-string+ arguments are
optional. If +\--input-null-string+ is not specified, then the string
@@ -127,9 +123,9 @@ If the staging table contains data and the +\--clear-staging-table+ option is
specified, Sqoop will delete all of the data before starting the export job.
NOTE: Support for staging data prior to pushing it into the destination
-table is not available for +--direct+ exports. It is also not available when
+table is not always available for +--direct+ exports. It is also not available when
export is invoked using the +--update-key+ option for updating existing data,
-and when stored procedures are used to insert the data.
+and when stored procedures are used to insert the data. It is best to check the <<connectors>> section to validate.
Inserts vs. Updates
http://git-wip-us.apache.org/repos/asf/sqoop/blob/c320b4fe/src/docs/user/import-all-tables.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/import-all-tables.txt b/src/docs/user/import-all-tables.txt
index 8c3a4f5..60645f1 100644
--- a/src/docs/user/import-all-tables.txt
+++ b/src/docs/user/import-all-tables.txt
@@ -1,4 +1,3 @@
-
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
@@ -49,8 +48,6 @@ Argument Description
+\--as-sequencefile+ Imports data to SequenceFiles
+\--as-textfile+ Imports data as plain text (default)
+\--direct+ Use direct import fast path
-+\--direct-split-size <n>+ Split the input stream every 'n' bytes when\
- importing in direct mode
+\--inline-lob-limit <n>+ Set the maximum size for an inline LOB
+-m,\--num-mappers <n>+ Use 'n' map tasks to import in parallel
+\--warehouse-dir <dir>+ HDFS parent for table destination
http://git-wip-us.apache.org/repos/asf/sqoop/blob/c320b4fe/src/docs/user/import.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/import.txt b/src/docs/user/import.txt
index 7a3fa43..192e97e 100644
--- a/src/docs/user/import.txt
+++ b/src/docs/user/import.txt
@@ -1,4 +1,3 @@
-
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
@@ -64,9 +63,7 @@ Argument Description
+\--columns <col,col,col...>+ Columns to import from table
+\--delete-target-dir+ Delete the import target directory\
if it exists
-+\--direct+ Use direct import fast path
-+\--direct-split-size <n>+ Split the input stream every 'n' bytes\
- when importing in direct mode
++\--direct+ Use direct connector if exists for the database
+\--fetch-size <n>+ Number of entries to read from database\
at once.
+\--inline-lob-limit <n>+ Set the maximum size for an inline LOB
@@ -231,13 +228,10 @@ data movement tools. For example, MySQL provides the +mysqldump+ tool
which can export data from MySQL to other systems very quickly. By
supplying the +\--direct+ argument, you are specifying that Sqoop
should attempt the direct import channel. This channel may be
-higher performance than using JDBC. Currently, direct mode does not
-support imports of large object columns.
+higher performance than using JDBC.
-When importing from PostgreSQL in conjunction with direct mode, you
-can split the import into separate files after
-individual files reach a certain size. This size limit is controlled
-with the +\--direct-split-size+ argument.
+Details about use of direct mode with each specific RDBMS, installation requirements, available
+options and limitations can be found in <<connectors>>.
By default, Sqoop will import a table named +foo+ to a directory named
+foo+ inside your home directory in HDFS. For example, if your
@@ -280,10 +274,6 @@ data to a temporary directory and then rename the files into the normal
target directory in a manner that does not conflict with existing filenames
in that directory.
-NOTE: When using the direct mode of import, certain database client utilities
-are expected to be present in the shell path of the task process. For MySQL
-the utilities +mysqldump+ and +mysqlimport+ are required, whereas for
-PostgreSQL the utility +psql+ is required.
Controlling transaction isolation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -683,13 +673,6 @@ $ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
-m 8
----
-Enabling the MySQL "direct mode" fast path:
-
-----
-$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
- --direct
-----
-
Storing data in SequenceFiles, and setting the generated class name to
+com.foocorp.Employee+: