You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@sqoop.apache.org by Joey Krabacher <jk...@gmail.com> on 2012/09/13 23:14:59 UTC

Import From Multiple Sources In Parallel

Has anyone had any experience with importing data into HDFS using
multiple sources.
For example, a cluster of MySQL databases that have the same table
structure and name, just different data.

Thanks ahead of time.

/* Joey */

Re: Import From Multiple Sources In Parallel

Posted by Chalcy <ch...@gmail.com>.

It depends on how you are importing data.  You can set up multiple sqoop
jobs going into different hive partitions of the same hive table and have
different hdfs target names.  If all the jobs have the same target hdfs
dir, all the jobs after the first one will fail on " already exisits"
exception.

If you are importing only hdfs, you can have different file names in goes
to the same directory like /user/hadoop/mytablename/mysource1  etc.,

You can test really easily by creating a table on two different data
sources and try to get it to hdfs.

Hope this helps,
Chalcy

On Thu, Sep 13, 2012 at 5:14 PM, Joey Krabacher <jk...@gmail.com>wrote:

> Has anyone had any experience with importing data into HDFS using
> multiple sources.
> For example, a cluster of MySQL databases that have the same table
> structure and name, just different data.
>
> Thanks ahead of time.
>
> /* Joey */
>