You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Anthony Smee <sm...@googlemail.com> on 2014/08/27 16:55:13 UTC
Updating data in HDFS
Hi there
I am new to Sqoop and have recently been reading the Apache Sqoop Cookbook
and wanted to ask a question. I noticed that in section 5 of the cookbook
it details how Sqoop can update data in an existing dataset in the
RDBMS. Sorry,
just to be clear - I am aware of the hive-import switch, but I have tables
in an RDBMS which has data updated going back numerous days over months and
my HDFS data is partitioned by numerous columns meaning lots of partitions
need to have the files merged.
My question, have you ever considered the same requirement but to update
data in Hive tables, e.g. the files in HDFS?
Just wondering if this is on the roadmap, and if not why not?
Thanks
RE: Updating data in HDFS
Posted by "Venkat, Ankam" <An...@centurylink.com>.
Smee,
Cookbook talks about updating the data on RDBMS when are EXPORTing data from Sqoop.
If you want to update data on Hive, you need to replace the data. I have implemented SCD Type 2 on hive tables recently.
You can refer to my presentation at http://files.meetup.com/1624468/Getting%20Jiggy%20with%20Change%20Data%20Capture%20and%20Slowly%20Changing%20Dimen.pdf which shows a simple example of how to update data on Hive.
Regards,
Venkat
From: Anthony Smee [mailto:smee.ant@googlemail.com]
Sent: Wednesday, August 27, 2014 8:55 AM
To: user@sqoop.apache.org
Subject: Updating data in HDFS
Hi there
I am new to Sqoop and have recently been reading the Apache Sqoop Cookbook and wanted to ask a question. I noticed that in section 5 of the cookbook it details how Sqoop can update data in an existing dataset in the RDBMS. Sorry, just to be clear - I am aware of the hive-import switch, but I have tables in an RDBMS which has data updated going back numerous days over months and my HDFS data is partitioned by numerous columns meaning lots of partitions need to have the files merged.
My question, have you ever considered the same requirement but to update data in Hive tables, e.g. the files in HDFS?
Just wondering if this is on the roadmap, and if not why not?
Thanks
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]