You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Bhupesh Bansal <bh...@groupon.com> on 2011/08/19 23:26:59 UTC

Alter table Set Locations for all partitions

Hey Folks,

I am wondering what is the easiest way to migrate data off one hadoop/hive
cluster to another.

I distcp all data to new cluster, and then copied the metadata directory to
new cluster.
hive comes up fine and show tables etc but the hive location is still
pointing to old cluster

There is one command
alter table table_name set location new_location

but it doesnt work for partitioned tables, is there a way we can do it for
*ALL* partitions easily ??

Best
Bhupesh

Re: Alter table Set Locations for all partitions

Posted by Bhupesh Bansal <bh...@groupon.com>.
Hey Folks,

I am using derby-db currently. I will play with the suggestions in the
thread. I was wondering since we already have a set location for table
partitions, hive should add support for wildcards in partition name etc. I
think that would be the best solution for the problem.

Best
Bhupesh


On Sat, Aug 20, 2011 at 10:05 PM, Ayon Sinha <ay...@yahoo.com> wrote:

> Make sure your Hive metadata database is a separate one and the new one has
> the tables pointing to the new cluster. I has this situation "hive comes
> up fine and show tables etc but the hive location is still pointing to old
> cluster" so all MR for Hive queries were pulling data over the network
> from the old cluster.
> Another way is to dump the metadata DB for the old cluster and string
> replace old cluster name with new cluster name in the db-dump.
> And the other option may be to use Sqoop telling it to pull data from the
> old Hive cluster via Hive JDBC drivers.
>
> -Ayon
> See My Photos on Flickr <http://www.flickr.com/photos/ayonsinha/>
> Also check out my Blog for answers to commonly asked questions.<http://dailyadvisor.blogspot.com>
>
> ------------------------------
> *From:* Bhupesh Bansal <bh...@groupon.com>
> *To:* user@hive.apache.org
> *Sent:* Friday, August 19, 2011 2:26 PM
> *Subject:* Alter table Set Locations for all partitions
>
> Hey Folks,
>
> I am wondering what is the easiest way to migrate data off one hadoop/hive
> cluster to another.
>
> I distcp all data to new cluster, and then copied the metadata directory to
> new cluster.
> hive comes up fine and show tables etc but the hive location is still
> pointing to old cluster
>
> There is one command
> alter table table_name set location new_location
>
> but it doesnt work for partitioned tables, is there a way we can do it for
> *ALL* partitions easily ??
>
> Best
> Bhupesh
>
>
>
>
>

Re: Alter table Set Locations for all partitions

Posted by Ayon Sinha <ay...@yahoo.com>.
Make sure your Hive metadata database is a separate one and the new one has the tables pointing to the new cluster. I has this situation "hive comes up fine and show tables etc but the hive location is still pointing to old cluster" so all MR for Hive queries were pulling data over the network from the old cluster.
Another way is to dump the metadata DB for the old cluster and string replace old cluster name with new cluster name in the db-dump.
And the other option may be to use Sqoop telling it to pull data from the old Hive cluster via Hive JDBC drivers.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: Bhupesh Bansal <bh...@groupon.com>
To: user@hive.apache.org
Sent: Friday, August 19, 2011 2:26 PM
Subject: Alter table Set Locations for all partitions


Hey Folks, 


I am wondering what is the easiest way to migrate data off one hadoop/hive cluster to another. 

I distcp all data to new cluster, and then copied the metadata directory to new cluster. 
hive comes up fine and show tables etc but the hive location is still pointing to old cluster

There is one command
alter table table_name set location new_location 

but it doesnt work for partitioned tables, is there a way we can do it for *ALL* partitions easily ??

Best
Bhupesh

Re: Alter table Set Locations for all partitions

Posted by Ayon Sinha <ay...@yahoo.com>.
I did have the exact scenario but w/o partitioned tables. I took the more hackish but definitive approach. I wrote a script to read "desc formatted <tablename>" for every db and then parse them and replace the name-node string and re-fire the create table call on new cluster. It worked well for me.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: Bhupesh Bansal <bh...@groupon.com>
To: user@hive.apache.org
Sent: Friday, August 19, 2011 2:26 PM
Subject: Alter table Set Locations for all partitions


Hey Folks, 


I am wondering what is the easiest way to migrate data off one hadoop/hive cluster to another. 

I distcp all data to new cluster, and then copied the metadata directory to new cluster. 
hive comes up fine and show tables etc but the hive location is still pointing to old cluster

There is one command
alter table table_name set location new_location 

but it doesnt work for partitioned tables, is there a way we can do it for *ALL* partitions easily ??

Best
Bhupesh

Re: Alter table Set Locations for all partitions

Posted by Bhupesh Bansal <bh...@groupon.com>.
Thanks David,

I will take a shot at this today.

Best
Bhupesh

On Sun, Aug 21, 2011 at 6:51 PM, David Burley <da...@geek.net> wrote:

> Bhupesh,
>
> The following worked for us on a MySQL metastore:
>
> mysqldump metastore > metastore-backup-`date +%Y%m%d`.sql
> mysql metastore
> UPDATE SDS SET LOCATION=REPLACE(LOCATION, 'hdfs://hadoop-namenode-1',
> 'hdfs://hadoop-namenode-2');
> UPDATE DBS SET `DESC`=REPLACE(`DESC`, 'hdfs://hadoop-namenode-1',
> 'hdfs://hadoop-namenode-2');
>
> But I think you noted you are using Derby. If I were using Derby, I'd
> switch to MySQL to make the management less painful, and then do the
> above. I provided some info on migrating from Derby to MySQL here:
>
> http://search-hadoop.com/m/J15J0FwAg1/v=plain
>
> If you must run Derby -- I'd think you could use the same tool,
> RazorSQL to load up the metastore and then tweak the underlying data.
>
> Good Luck!
>
> --David
>
> On Fri, Aug 19, 2011 at 5:26 PM, Bhupesh Bansal <bh...@groupon.com>
> wrote:
> > Hey Folks,
> >
> > I am wondering what is the easiest way to migrate data off one
> hadoop/hive
> > cluster to another.
> >
> > I distcp all data to new cluster, and then copied the metadata directory
> to
> > new cluster.
> > hive comes up fine and show tables etc but the hive location is still
> > pointing to old cluster
> >
> > There is one command
> > alter table table_name set location new_location
> >
> > but it doesnt work for partitioned tables, is there a way we can do it
> for
> > *ALL* partitions easily ??
> >
> > Best
> > Bhupesh
> >
> >
> >
> ====
> This e- mail message is intended only for the named recipient(s) above. It
> may contain confidential and privileged information. If you are not the
> intended recipient you are hereby notified that any dissemination,
> distribution or copying of this e-mail and any attachment(s) is strictly
> prohibited. If you have received this e-mail in error, please immediately
> notify the sender by replying to this e-mail and delete the message and any
> attachment(s) from your system. Thank you.
>
>

Re: Alter table Set Locations for all partitions

Posted by David Burley <da...@geek.net>.
Bhupesh,

The following worked for us on a MySQL metastore:

mysqldump metastore > metastore-backup-`date +%Y%m%d`.sql
mysql metastore
UPDATE SDS SET LOCATION=REPLACE(LOCATION, 'hdfs://hadoop-namenode-1',
'hdfs://hadoop-namenode-2');
UPDATE DBS SET `DESC`=REPLACE(`DESC`, 'hdfs://hadoop-namenode-1',
'hdfs://hadoop-namenode-2');

But I think you noted you are using Derby. If I were using Derby, I'd
switch to MySQL to make the management less painful, and then do the
above. I provided some info on migrating from Derby to MySQL here:

http://search-hadoop.com/m/J15J0FwAg1/v=plain

If you must run Derby -- I'd think you could use the same tool,
RazorSQL to load up the metastore and then tweak the underlying data.

Good Luck!

--David

On Fri, Aug 19, 2011 at 5:26 PM, Bhupesh Bansal <bh...@groupon.com> wrote:
> Hey Folks,
>
> I am wondering what is the easiest way to migrate data off one hadoop/hive
> cluster to another.
>
> I distcp all data to new cluster, and then copied the metadata directory to
> new cluster.
> hive comes up fine and show tables etc but the hive location is still
> pointing to old cluster
>
> There is one command
> alter table table_name set location new_location
>
> but it doesnt work for partitioned tables, is there a way we can do it for
> *ALL* partitions easily ??
>
> Best
> Bhupesh
>
>
>
====
This e- mail message is intended only for the named recipient(s) above. It may contain confidential and privileged information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.


Re: Alter table Set Locations for all partitions

Posted by Florin Diaconeasa <fl...@gmail.com>.
Hello,

Where do you keep your metadata?
If it's a regular RDBMS, you could update the tables directly.

the location is in the partitions table inside your metadata database.

Florin

On Aug 20, 2011, at 3:52 AM, Aggarwal, Vaibhav wrote:

> You could also specify fully qualified hdfs path in the create table command.
> It could look like
>  
> create external table test(key string )
> row format delimited
> fields terminated by '\000'
> collection items terminated by ' '
> location 'hdfs://new_master_host:port/table_path’;
>  
> Then you can use the ‘insert overwrite’ command to move all data to the new table.
> You can use dynamic partitioning to move partitioned tables.
> Please use the following options:
> SET hive.exec.dynamic.partition=true;
> SET hive.exec.dynamic.partition.mode=nonstrict;
>  
> That way you will have the data copied and all the partitions created correctly for you.
>  
> Thanks
> Vaibhav
>  
> From: Bhupesh Bansal [mailto:bhupesh@groupon.com] 
> Sent: Friday, August 19, 2011 2:27 PM
> To: user@hive.apache.org
> Subject: Alter table Set Locations for all partitions
>  
> Hey Folks,
> 
> I am wondering what is the easiest way to migrate data off one hadoop/hive cluster to another. 
> 
> I distcp all data to new cluster, and then copied the metadata directory to new cluster. 
> hive comes up fine and show tables etc but the hive location is still pointing to old cluster
> 
> There is one command
> alter table table_name set location new_location 
> 
> but it doesnt work for partitioned tables, is there a way we can do it for *ALL* partitions easily ??
> 
> Best
> Bhupesh
> 


RE: Alter table Set Locations for all partitions

Posted by "Aggarwal, Vaibhav" <va...@amazon.com>.
You could also specify fully qualified hdfs path in the create table command.
It could look like

create external table test(key string )
row format delimited
fields terminated by '\000'
collection items terminated by ' '
location 'hdfs://new_master_host:port/table_path';

Then you can use the 'insert overwrite' command to move all data to the new table.
You can use dynamic partitioning to move partitioned tables.
Please use the following options:

SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict;

That way you will have the data copied and all the partitions created correctly for you.

Thanks
Vaibhav

From: Bhupesh Bansal [mailto:bhupesh@groupon.com]
Sent: Friday, August 19, 2011 2:27 PM
To: user@hive.apache.org
Subject: Alter table Set Locations for all partitions

Hey Folks,

I am wondering what is the easiest way to migrate data off one hadoop/hive cluster to another.

I distcp all data to new cluster, and then copied the metadata directory to new cluster.
hive comes up fine and show tables etc but the hive location is still pointing to old cluster

There is one command
alter table table_name set location new_location

but it doesnt work for partitioned tables, is there a way we can do it for *ALL* partitions easily ??

Best
Bhupesh