You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Tomer Shiran <ts...@maprtech.com> on 2009/11/23 00:11:37 UTC

Hive metastore on MySQL or files

Is it possible to set up Hive with a metastore in MySQL or NFS?

I think that changing the configuration parameters (e.g.,
javax.jdo.option.ConnectionURL) would make it possible to use MySQL, but I
haven't seen any documentation on that. Also, what about using files instead
of a database?

Thanks,
Tomer

Re: Hive metastore on MySQL or files

Posted by Prasad Chakka <pc...@facebook.com>.
Should have been 'There is NO authentication/authorization in Hive yet.'


________________________________
From: Prasad Chakka <pc...@facebook.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Sun, 22 Nov 2009 19:44:45 -0800
To: <hi...@hadoop.apache.org>
Subject: Re: Hive metastore on MySQL or files


 1.  No. You should put one pair of username/password in hive-site.xml which will be used for all.
 2.  There is authentication/authorization in Hive yet. See HIVE-78 for more details on it will be done. For now, everything is readable and writeable except for the default database which can't be deleted.
 3.  The JDBC data store to use will greatly depend on what other RDBMS your organization uses. Most probably the biggest metastore is Facebook's and we use mysql for administrative purposes. I don't think performance is a big deal yet. You should use whatever you are comfortable with.

Prasad

________________________________
From: Tomer Shiran <ts...@maprtech.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Sun, 22 Nov 2009 17:55:43 -0800
To: <hi...@hadoop.apache.org>
Subject: Re: Hive metastore on MySQL or files

That definitely makes sense. I have a few follow-up questions:


 1.  If I'm using MySQL (i.e., Hive's Multi User Mode), does that mean that each user should have a valid MySQL username and password?
 2.  Is there any authentication built into the Thrift protocols when using Hive's Remote Server option. How are permissions handled in that case?
 3.  Does it make sense to use the Remote Server option with Derby?

Thanks,
Tomer

On Sun, Nov 22, 2009 at 4:58 PM, Carl Steinbach <ca...@cloudera.com> wrote:
Hi Ed,


Anyone have a great reason that MySQL is better then Derby?

Given the negligible affect that metastore performance has on Hive's overall performance, I think manageability is the dominant concern for most people when selecting a metastore datastore. If your organization is already using MySQL/Postgres/etc and has a person maintaining and managing backups for these systems, it is probably better to piggyback on that effort than to further complicate matters with the addition of another critical infrastructure component.

Carl




--
Tomer Shiran
Director of Product Management | MapR Technologies (www.mapr.com <http://www.mapr.com> ) | 650-804-8657




Re: Hive metastore on MySQL or files

Posted by Prasad Chakka <pc...@facebook.com>.
 1.  No. You should put one pair of username/password in hive-site.xml which will be used for all.
 2.  There is authentication/authorization in Hive yet. See HIVE-78 for more details on it will be done. For now, everything is readable and writeable except for the default database which can't be deleted.
 3.  The JDBC data store to use will greatly depend on what other RDBMS your organization uses. Most probably the biggest metastore is Facebook's and we use mysql for administrative purposes. I don't think performance is a big deal yet. You should use whatever you are comfortable with.

Prasad

________________________________
From: Tomer Shiran <ts...@maprtech.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Sun, 22 Nov 2009 17:55:43 -0800
To: <hi...@hadoop.apache.org>
Subject: Re: Hive metastore on MySQL or files

That definitely makes sense. I have a few follow-up questions:


 1.  If I'm using MySQL (i.e., Hive's Multi User Mode), does that mean that each user should have a valid MySQL username and password?
 2.  Is there any authentication built into the Thrift protocols when using Hive's Remote Server option. How are permissions handled in that case?
 3.  Does it make sense to use the Remote Server option with Derby?

Thanks,
Tomer

On Sun, Nov 22, 2009 at 4:58 PM, Carl Steinbach <ca...@cloudera.com> wrote:
Hi Ed,


Anyone have a great reason that MySQL is better then Derby?

Given the negligible affect that metastore performance has on Hive's overall performance, I think manageability is the dominant concern for most people when selecting a metastore datastore. If your organization is already using MySQL/Postgres/etc and has a person maintaining and managing backups for these systems, it is probably better to piggyback on that effort than to further complicate matters with the addition of another critical infrastructure component.

Carl




--
Tomer Shiran
Director of Product Management | MapR Technologies (www.mapr.com <http://www.mapr.com> ) | 650-804-8657



Re: Hive metastore on MySQL or files

Posted by Tomer Shiran <ts...@maprtech.com>.
That definitely makes sense. I have a few follow-up questions:


   1. If I'm using MySQL (i.e., Hive's Multi User Mode), does that mean that
   each user should have a valid MySQL username and password?
   2. Is there any authentication built into the Thrift protocols when using
   Hive's Remote Server option. How are permissions handled in that case?
   3. Does it make sense to use the Remote Server option with Derby?

Thanks,
Tomer

On Sun, Nov 22, 2009 at 4:58 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> Hi Ed,
>
>
>  Anyone have a great reason that MySQL is better then Derby?
>>
>
> Given the negligible affect that metastore performance has on Hive's
> overall performance, I think manageability is the dominant concern for most
> people when selecting a metastore datastore. If your organization is already
> using MySQL/Postgres/etc and has a person maintaining and managing backups
> for these systems, it is probably better to piggyback on that effort than to
> further complicate matters with the addition of another critical
> infrastructure component.
>
> Carl
>
>


-- 
Tomer Shiran
Director of Product Management | MapR Technologies (www.mapr.com) |
650-804-8657

Re: Hive metastore on MySQL or files

Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Ed,

 Anyone have a great reason that MySQL is better then Derby?
>

Given the negligible affect that metastore performance has on Hive's overall
performance, I think manageability is the dominant concern for most people
when selecting a metastore datastore. If your organization is already using
MySQL/Postgres/etc and has a person maintaining and managing backups for
these systems, it is probably better to piggyback on that effort than to
further complicate matters with the addition of another critical
infrastructure component.

Carl

Re: Hive metastore on MySQL or files

Posted by Edward Capriolo <ed...@gmail.com>.
On Sun, Nov 22, 2009 at 7:25 PM, Carl Steinbach <ca...@cloudera.com> wrote:
> Hi Tomer,
>
> Generally speaking the Hive Metastore can run on top of any datastore that
> supports JDBC, and many people have used MySQL for this purpose. The
> Metatore Admin page on the Hive wiki has more information about the
> different configuration options:
> http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin
>
> Hope this helps.
>
> Carl
>
> On Sun, Nov 22, 2009 at 3:11 PM, Tomer Shiran <ts...@maprtech.com> wrote:
>>
>> Is it possible to set up Hive with a metastore in MySQL or NFS?
>>
>> I think that changing the configuration parameters (e.g.,
>> javax.jdo.option.ConnectionURL) would make it possible to use MySQL, but I
>> haven't seen any documentation on that. Also, what about using files instead
>> of a database?
>>
>> Thanks,
>> Tomer
>
>

I have never heard anyone attest to any performance gains of using
mysql. The MetaData information stored in JPOX should be fairly small.
Long lived hive queries spend more of the time operating on data in
hdfs, not meta data in derby. As a result, I think there would not be
much performance gain using MySQL, though there may be something to
gain on the management side.
Although derby has master/slave and backup capability.

Anyone have a great reason that MySQL is better then Derby?

Re: Hive metastore on MySQL or files

Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Tomer,

Generally speaking the Hive Metastore can run on top of any datastore that
supports JDBC, and many people have used MySQL for this purpose. The
Metatore Admin page on the Hive wiki has more information about the
different configuration options:
http://wiki.apache.org/hadoop/Hive/AdminManual/MetastoreAdmin

Hope this helps.

Carl

On Sun, Nov 22, 2009 at 3:11 PM, Tomer Shiran <ts...@maprtech.com> wrote:

> Is it possible to set up Hive with a metastore in MySQL or NFS?
>
> I think that changing the configuration parameters (e.g.,
> javax.jdo.option.ConnectionURL) would make it possible to use MySQL, but I
> haven't seen any documentation on that. Also, what about using files instead
> of a database?
>
> Thanks,
> Tomer
>