You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Abraham Elmahrek (JIRA)" <ji...@apache.org> on 2015/04/12 02:02:15 UTC

[jira] [Created] (SQOOP-2301) Sqoop2: Document how generic jdbc connector generates sql

Abraham Elmahrek created SQOOP-2301:
---------------------------------------

             Summary: Sqoop2: Document how generic jdbc connector generates sql
                 Key: SQOOP-2301
                 URL: https://issues.apache.org/jira/browse/SQOOP-2301
             Project: Sqoop
          Issue Type: Bug
            Reporter: Abraham Elmahrek
             Fix For: 2.0.0


There's a lot more information that can added in the {{Connectors.rst}} doc. There is a lot of overlap in terms of how some of the SQL is generated for boundary query and the general queries. Here's some information that can be added:

# {{table name}} - the name of the table to transfer data from
# {{columns}} - a list of columns to pull from the db
# {{query}} - SQL statement for querying the data.
# {{boundary query}} - SQL statement for defining the boundaries.
# {{Partition column}} - Used for partitioning.

*{{table name}}, {{columns}}, and {{query}}*
Sqoop2 will generate SQL automatically when {{table name}} is provided. {{table name}} and {{columns}} are intended to be used together. If {{query}} is specified, {{table name}} should not be specified.

*{{boundary query}}*
Sqoop2 will generate SQL for the {{boundary query}} if not specified. The boundary query is used to define a floor and ceiling for the input splits. These splits define ranges that will be applied to the {{partition column}}. If {{table name}} is specified, then the SQL generated will take on the form:
 
{code}SELECT MIN(`partition column`), MAX(`partition column`) FROM `table name`{code}

If {{query}} is specified, then the generated SQL will take on the form:

{code}SELECT MIN(`partition column`), MAX(`partition column`) FROM (SELECT ... FROM ... WHERE 1 = 1) SUBQUERY{code}

*{{partition column}}*
The {{partition column}} will automatically be the private key of the table if not specified.

*Aliases*
Aliases can be used in custom SQL, but keep in mind how the other generated SQL statements will be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)