You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Josh Ferguson <jo...@besquared.net> on 2008/12/09 05:21:45 UTC

Metadata in Multiuser DB

Does anyone have an example of how to setup the hive config file to  
keep schema information in something other than derby so that multiple  
users and machines can access it at once?

Josh

Re: Metadata in Multiuser DB

Posted by Josh Ferguson <jo...@besquared.net>.
Yeah this is perfect, I'll try it out when I get home tonight. Thanks!
Josh

On Tue, Dec 9, 2008 at 9:15 AM, Joydeep Sen Sarma <js...@facebook.com>wrote:

>  We use mysql as metadb server.
>
>
>
> Prasad can give a more detailed response when he's back – but here are the
> relevant entries from our hive-default.xml:
>
>
>
> <property>
>
>   <name>javax.jdo.option.ConnectionURL</name>
>
>   <value>jdbc:mysql://
> xxx.yyy.facebook.com/hms_during_upgrade?createDatabaseIfNotExist=true
> </value>
>
> </property>
>
>
>
> <property>
>
>   <name>javax.jdo.option.ConnectionDriverName</name>
>
>   <value>com.mysql.jdbc.Driver</value>
>
> </property>
>
>
>
> <property>
>
>   <name>javax.jdo.option.ConnectionUserName</name>
>
>   <value>root</value>
>
> </property>
>
>
>
> <property>
>
>   <name>javax.jdo.option.ConnectionPassword</name>
>
>   <value>xxxxxx</value>
>
> </property>
>
>
>
> <property>
>
>   <name>org.jpox.autoCreateSchema</name>
>
>   <value>false</value>
>
> </property>
>
>
>
> <property>
>
>   <name>org.jpox.fixedDatastore</name>
>
>   <value>true</value>
>
> </property>
>
>
>
> <property>
>
>   <name>hive.metastore.usefilestore</name>
>
>   <value>false</value>
>
> </property>
>
>
>
> <property>
>
>   <name>hive.metastore.checkForDefaultDb</name>
>
>   <value>false</value>
>
> </property>
>
>
>
> <property>
>
>   <name>hive.metastore.rawstore.impl</name>
>
>   <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
>
>   <description>Name of the class that implements
> org.apache.hadoop.hive.metastore.rawstore interface. This class is used to
> store and retrieval of raw metadata objects such as table, dat
>
> abase</description>
>
> </property>
>
>
>
> <property>
>
>   <name>hive.metastore.local</name>
>
>   <value>true</value>
>
> </property>
>
>
>
>
>
> -          autoCreateSchema will probably have to be true for first time
> deployment.
>
> -          Not sure usefilestore is required or not – this is just a
> vestige from older code that's saying that we want to use db backed
> metastore. No harm putting it there
>
> -          Hive.metastore.local – this is important since we want to
> connect from hive cli directly to metastore (hence 'local') (instead of
> through a thrift server)
>
> -          checkForDefaultDb – not entirely sure about this – but this was
> having some performance impact for us.
>
>
>
> Hope this helps,
>
>
>
> Joydeep
>
>
>
>
>
>
>  ------------------------------
>
> *From:* Bill Au [mailto:bill.w.au@gmail.com]
> *Sent:* Tuesday, December 09, 2008 7:21 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Re: Metadata in Multiuser DB
>
>
>
> I followed the instruction in the wiki for using derby in network server
> mode and was able to get hive running in multi-user mode.  I would be
> interested in using mysql instead.  Not sure if I will have time to try to
> get that working so instructions from someone who has already done so would
> be very useful.
>
> Bill
>
> On Tue, Dec 9, 2008 at 9:46 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> You have two options:
>
> 1) You can start derby in network server mode rather than embedded.
> http://wiki.apache.org/hadoop/HiveDerbyServerMode
> 2) You can also follow the above instructions and replaced derby with
> an jpox capable database.
>
> If you were going to use another database besides derby server mode
> like mysql. It would be cool if you added another wiki with
> instructions. I have never tried mysql as a metastore but it may have
> better performance/be more tunable then derby.
>
>
>

Re: Metadata in Multiuser DB

Posted by Edward Capriolo <ed...@gmail.com>.
mysql is a popular database system. I prefer derby technically because
it offers no external dependencies. Because derby is a Java
application I can pick up the pieces and move them to a different
directory or a dedicated server. 100% pure is a big perk for me. Also
as of right now Derby is the target platform.

I have not looked for a benchmark Derby against MySQL. However
remember that a hive query might run for hours, a few ms latency in
fetching Derby meta data is a much smaller order of magnitude.

The above is opinion based on my deployment.

Re: Metadata in Multiuser DB

Posted by Josh Ferguson <jo...@besquared.net>.
What does this usually mean?

hive> SHOW TABLES;
FAILED: Error in metadata: MetaException(message:Got exception:  
javax.jdo.JDOFatalInternalException Error creating transactional  
connection factory)
FAILED: Execution Error, return code 1 from  
org.apache.hadoop.hive.ql.exec.DDLTask
Time taken: 2.228 seconds

Josh

On Dec 9, 2008, at 9:15 AM, Joydeep Sen Sarma wrote:

> We use mysql as metadb server.
>
> Prasad can give a more detailed response when he’s back – but here  
> are the relevant entries from our hive-default.xml:
>
> <property>
>   <name>javax.jdo.option.ConnectionURL</name>
>   <value>jdbc:mysql://xxx.yyy.facebook.com/hms_during_upgrade? 
> createDatabaseIfNotExist=true</value>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionDriverName</name>
>   <value>com.mysql.jdbc.Driver</value>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionUserName</name>
>   <value>root</value>
> </property>
>
> <property>
>   <name>javax.jdo.option.ConnectionPassword</name>
>   <value>xxxxxx</value>
> </property>
>
> <property>
>   <name>org.jpox.autoCreateSchema</name>
>   <value>false</value>
> </property>
>
> <property>
>   <name>org.jpox.fixedDatastore</name>
>   <value>true</value>
> </property>
>
> <property>
>   <name>hive.metastore.usefilestore</name>
>   <value>false</value>
> </property>
>
> <property>
>   <name>hive.metastore.checkForDefaultDb</name>
>   <value>false</value>
> </property>
>
> <property>
>   <name>hive.metastore.rawstore.impl</name>
>   <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
>   <description>Name of the class that implements  
> org.apache.hadoop.hive.metastore.rawstore interface. This class is  
> used to store and retrieval of raw metadata objects such as table, dat
> abase</description>
> </property>
>
> <property>
>   <name>hive.metastore.local</name>
>   <value>true</value>
> </property>
>
>
> -          autoCreateSchema will probably have to be true for first  
> time deployment.
> -          Not sure usefilestore is required or not – this is just a  
> vestige from older code that’s saying that we want to use db backed  
> metastore. No harm putting it there
> -          Hive.metastore.local – this is important since we want to  
> connect from hive cli directly to metastore (hence ‘local’) (instead  
> of through a thrift server)
> -          checkForDefaultDb – not entirely sure about this – but  
> this was having some performance impact for us.
>
> Hope this helps,
>
> Joydeep
>
>
>
> From: Bill Au [mailto:bill.w.au@gmail.com]
> Sent: Tuesday, December 09, 2008 7:21 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Metadata in Multiuser DB
>
> I followed the instruction in the wiki for using derby in network  
> server mode and was able to get hive running in multi-user mode.  I  
> would be interested in using mysql instead.  Not sure if I will have  
> time to try to get that working so instructions from someone who has  
> already done so would be very useful.
>
> Bill
> On Tue, Dec 9, 2008 at 9:46 AM, Edward Capriolo  
> <ed...@gmail.com> wrote:
> You have two options:
>
> 1) You can start derby in network server mode rather than embedded.
> http://wiki.apache.org/hadoop/HiveDerbyServerMode
> 2) You can also follow the above instructions and replaced derby with
> an jpox capable database.
>
> If you were going to use another database besides derby server mode
> like mysql. It would be cool if you added another wiki with
> instructions. I have never tried mysql as a metastore but it may have
> better performance/be more tunable then derby.
>


RE: Metadata in Multiuser DB

Posted by Joydeep Sen Sarma <js...@facebook.com>.
We use mysql as metadb server.

Prasad can give a more detailed response when he's back - but here are the relevant entries from our hive-default.xml:

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://xxx.yyy.facebook.com/hms_during_upgrade?createDatabaseIfNotExist=true</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>xxxxxx</value>
</property>

<property>
  <name>org.jpox.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>org.jpox.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.usefilestore</name>
  <value>false</value>
</property>

<property>
  <name>hive.metastore.checkForDefaultDb</name>
  <value>false</value>
</property>

<property>
  <name>hive.metastore.rawstore.impl</name>
  <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
  <description>Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, dat
abase</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>true</value>
</property>


-          autoCreateSchema will probably have to be true for first time deployment.
-          Not sure usefilestore is required or not - this is just a vestige from older code that's saying that we want to use db backed metastore. No harm putting it there
-          Hive.metastore.local - this is important since we want to connect from hive cli directly to metastore (hence 'local') (instead of through a thrift server)
-          checkForDefaultDb - not entirely sure about this - but this was having some performance impact for us.

Hope this helps,

Joydeep



________________________________
From: Bill Au [mailto:bill.w.au@gmail.com]
Sent: Tuesday, December 09, 2008 7:21 AM
To: hive-user@hadoop.apache.org
Subject: Re: Metadata in Multiuser DB

I followed the instruction in the wiki for using derby in network server mode and was able to get hive running in multi-user mode.  I would be interested in using mysql instead.  Not sure if I will have time to try to get that working so instructions from someone who has already done so would be very useful.

Bill
On Tue, Dec 9, 2008 at 9:46 AM, Edward Capriolo <ed...@gmail.com>> wrote:
You have two options:

1) You can start derby in network server mode rather than embedded.
http://wiki.apache.org/hadoop/HiveDerbyServerMode
2) You can also follow the above instructions and replaced derby with
an jpox capable database.

If you were going to use another database besides derby server mode
like mysql. It would be cool if you added another wiki with
instructions. I have never tried mysql as a metastore but it may have
better performance/be more tunable then derby.


Re: Metadata in Multiuser DB

Posted by Bill Au <bi...@gmail.com>.
I followed the instruction in the wiki for using derby in network server
mode and was able to get hive running in multi-user mode.  I would be
interested in using mysql instead.  Not sure if I will have time to try to
get that working so instructions from someone who has already done so would
be very useful.

Bill

On Tue, Dec 9, 2008 at 9:46 AM, Edward Capriolo <ed...@gmail.com>wrote:

> You have two options:
>
> 1) You can start derby in network server mode rather than embedded.
> http://wiki.apache.org/hadoop/HiveDerbyServerMode
> 2) You can also follow the above instructions and replaced derby with
> an jpox capable database.
>
> If you were going to use another database besides derby server mode
> like mysql. It would be cool if you added another wiki with
> instructions. I have never tried mysql as a metastore but it may have
> better performance/be more tunable then derby.
>

Re: Metadata in Multiuser DB

Posted by Edward Capriolo <ed...@gmail.com>.
You have two options:

1) You can start derby in network server mode rather than embedded.
http://wiki.apache.org/hadoop/HiveDerbyServerMode
2) You can also follow the above instructions and replaced derby with
an jpox capable database.

If you were going to use another database besides derby server mode
like mysql. It would be cool if you added another wiki with
instructions. I have never tried mysql as a metastore but it may have
better performance/be more tunable then derby.