You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Anthony Smee <sm...@googlemail.com> on 2014/08/27 16:55:13 UTC

Updating data in HDFS

Hi there

I am new to Sqoop and have recently been reading the Apache Sqoop Cookbook
and wanted to ask a question. I noticed that in section 5 of the cookbook
it details how Sqoop can update data in an existing dataset in the
RDBMS. Sorry,
just to be clear - I am aware of the hive-import switch, but I have tables
in an RDBMS which has data updated going back numerous days over months and
my HDFS data is partitioned by numerous columns meaning lots of partitions
need to have the files merged.

My question, have you ever considered the same requirement but to update
data in Hive tables, e.g. the files in HDFS?

Just wondering if this is on the roadmap, and if not why not?

Thanks

RE: Updating data in HDFS

Posted by "Venkat, Ankam" <An...@centurylink.com>.
Smee,

Cookbook talks about updating the data on RDBMS when are EXPORTing data from Sqoop.

If you want to update data on Hive, you need to replace the data.  I have implemented SCD Type 2 on hive tables recently.

You can refer to my presentation at http://files.meetup.com/1624468/Getting%20Jiggy%20with%20Change%20Data%20Capture%20and%20Slowly%20Changing%20Dimen.pdf  which shows a simple example of how to update data on Hive.

Regards,
Venkat

From: Anthony Smee [mailto:smee.ant@googlemail.com]
Sent: Wednesday, August 27, 2014 8:55 AM
To: user@sqoop.apache.org
Subject: Updating data in HDFS

Hi there

I am new to Sqoop and have recently been reading the Apache Sqoop Cookbook and wanted to ask a question. I noticed that in section 5 of the cookbook it details how Sqoop can update data in an existing dataset in the RDBMS. Sorry, just to be clear - I am aware of the hive-import switch, but I have tables in an RDBMS which has data updated going back numerous days over months and my HDFS data is partitioned by numerous columns meaning lots of partitions need to have the files merged.

My question, have you ever considered the same requirement but to update data in Hive tables, e.g. the files in HDFS?

Just wondering if this is on the roadmap, and if not why not?

Thanks



[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif]