You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2015/07/17 07:13:46 UTC

Valid column for Sqoop SplitBy ?

Hi ,

>From Sqoop Doc :

> *--split-by <column-name> : Column of the table used to split work units*
>


> When performing parallel imports, Sqoop needs a criterion by which it can
> split the workload. Sqoop uses a *splitting column* to split the
> workload. By default, Sqoop will identify the primary key column (if
> present) in a table and use it as the splitting column. The low and high
> values for the splitting column are retrieved from the database, and the
> map tasks operate on evenly-sized components of the total range.
>

I used quiet many times , NUMBER , VARCHAR , TIMESTAMP etc. columns for
--split-by.  It went well for me.
But recently , I was  tring to import table from Oracle to Hive. It was
failing for String column.  (if you want , I can attach import command
logs.)

Also I got from Web using String column for *split-by* :

1. If using text column for text splitting, this is the warning Sqoop
displays:
If your database sorts in a case-insensitive order, this may result in a
partial import or duplicate records.
2. Text splitting is not supported for unicode character columns.

In order to get equal split among mappers with any duplicate and missing
record, Help me upderstand splitby usage.


Regards
Sanjiv Singh
Mob :  +091 9990-447-339