You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Kevin Eppinger <ke...@adknowledge.com> on 2009/03/24 20:50:05 UTC

Updating HDFS table locations in metastore_db

Hello all -

First off, let me describe how I'm using Hadoop/Hive...I have a Hadoop cluster running on Amazon EC2 instances with each node's data stored on an EBS volume.  Networked Derby is installed on the Hadoop "master" node with the Hive metastore database also stored on an EBS volume.  I use Hive to connect to and use Hadoop from a separate server instance.  Everything runs fine until I take down the Hadoop cluster and restart it (part of my requirements to be able to do so).  After that, Hive complains about expecting a different HDFS location for the tables...understandably so since Hadoop is running on EC2 instances and the server's DNS name is different when the new cluster spools up.

I manually connected to the Derby database using 'ij' (Derby's utility) and found a couple tables (DBS & SDS tables) that had references to HDFS locations on the old Hadoop cluster.  My question is:  Is there a way to tell Hive to updates those HDFS locations in its metastore database or will I have to manually change the HDFS locations in Derby whenever I start up a new Hadoop/EC2 cluster?

Thanks for any help,
Kevin

RE: Updating HDFS table locations in metastore_db

Posted by Kevin Eppinger <ke...@adknowledge.com>.
Prasad-

It can be done using an EC2 elastic IP to identify the Hadoop name node.  That way the HDFS locations stay consistent with each cluster.  Part of my requirements, however, is to not use elastic IPs.  But that would work.

-kevin

________________________________
From: Prasad Chakka [mailto:prasad@facebook.com]
Sent: Tuesday, March 24, 2009 3:23 PM
To: hive-user@hadoop.apache.org
Subject: Re: Updating HDFS table locations in metastore_db

Hi Kevin,

There is no such facility right now but it is a good functionality to have. It can be added in HiveMetaStore and exposed using a new Hive QL command.

One workaround is to use a vip but I am not sure it can be done in EC2.

Thanks,
Prasad
________________________________
From: Kevin Eppinger <ke...@adknowledge.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Tue, 24 Mar 2009 12:50:05 -0700
To: <hi...@hadoop.apache.org>
Subject: Updating HDFS table locations in metastore_db

Hello all -

First off, let me describe how I'm using Hadoop/Hive...I have a Hadoop cluster running on Amazon EC2 instances with each node's data stored on an EBS volume.  Networked Derby is installed on the Hadoop "master" node with the Hive metastore database also stored on an EBS volume.  I use Hive to connect to and use Hadoop from a separate server instance. Everything runs fine until I take down the Hadoop cluster and restart it (part of my requirements to be able to do so).  After that, Hive complains about expecting a different HDFS location for the tables...understandably so since Hadoop is running on EC2 instances and the server's DNS name is different when the new cluster spools up.

I manually connected to the Derby database using 'ij' (Derby's utility) and found a couple tables (DBS & SDS tables) that had references to HDFS locations on the old Hadoop cluster.  My question is:  Is there a way to tell Hive to updates those HDFS locations in its metastore database or will I have to manually change the HDFS locations in Derby whenever I start up a new Hadoop/EC2 cluster?

Thanks for any help,
Kevin

Re: Updating HDFS table locations in metastore_db

Posted by Prasad Chakka <pr...@facebook.com>.
Hi Kevin,

There is no such facility right now but it is a good functionality to have. It can be added in HiveMetaStore and exposed using a new Hive QL command.

One workaround is to use a vip but I am not sure it can be done in EC2.

Thanks,
Prasad

________________________________
From: Kevin Eppinger <ke...@adknowledge.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Tue, 24 Mar 2009 12:50:05 -0700
To: <hi...@hadoop.apache.org>
Subject: Updating HDFS table locations in metastore_db

Hello all -

First off, let me describe how I'm using Hadoop/Hive...I have a Hadoop cluster running on Amazon EC2 instances with each node's data stored on an EBS volume.  Networked Derby is installed on the Hadoop "master" node with the Hive metastore database also stored on an EBS volume.  I use Hive to connect to and use Hadoop from a separate server instance. Everything runs fine until I take down the Hadoop cluster and restart it (part of my requirements to be able to do so).  After that, Hive complains about expecting a different HDFS location for the tables...understandably so since Hadoop is running on EC2 instances and the server's DNS name is different when the new cluster spools up.

I manually connected to the Derby database using 'ij' (Derby's utility) and found a couple tables (DBS & SDS tables) that had references to HDFS locations on the old Hadoop cluster.  My question is:  Is there a way to tell Hive to updates those HDFS locations in its metastore database or will I have to manually change the HDFS locations in Derby whenever I start up a new Hadoop/EC2 cluster?

Thanks for any help,
Kevin