You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Omkar Joshi <Om...@lntinfotech.com> on 2013/06/27 08:45:19 UTC
Import from MySQL to Hive using Sqoop
Hi,
I have to import > 400 million rows from a MySQL table(having a composite primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need to partition the data based on the departure date.
The versions :
Apache Hadoop - 1.0.4
Apache Hive - 0.9.0
Apache Sqoop - sqoop-1.4.2.bin__hadoop-1.0.0
As per my knowledge, there are 3 approaches:
1. MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table into Partitioned Hive table
The current painful one that I'm following
2. MySQL -> Partitioned Hive table
I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable to find an example
3. MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add PARTITION
The syntax dictates to specify partitions as key value pairs - not feasible in case of millions of records where one cannot think of all the partition key-value pairs
Can anyone provide inputs for approaches 2 and 3?
Regards,
Omkar Joshi
________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"