You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Omkar Joshi <Om...@lntinfotech.com> on 2013/06/27 08:45:19 UTC

Import from MySQL to Hive using Sqoop

Hi,

I have to import > 400 million rows from a MySQL table(having a composite primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need to partition the data based on the departure date.

The versions :

Apache Hadoop  -           1.0.4
Apache Hive      -           0.9.0
Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0

As per my knowledge, there are 3 approaches:
1.         MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table into Partitioned Hive table
The current painful one that I'm following
2.         MySQL -> Partitioned Hive table
I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable to find an example
3.         MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add PARTITION
The syntax dictates to specify partitions as key value pairs - not feasible in case of millions of records where one cannot think of all the partition key-value pairs

Can anyone provide inputs for approaches 2 and 3?

Regards,
Omkar Joshi

________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"