You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Christian Kurz <cr...@gmx.de> on 2011/08/24 16:53:03 UTC

Understanding distributed Hive server and Hive Metastore setup

Greetings,

could somebody confirm/correct my understanding of a fully distributed 
Hive setup, please?

My setup is as follows

    * *Java application using Hive JDBC driver *connects to
    * *hive --service hiveserver*, which connects to
    * *hive --service metastore*, which uses an embedded Derby database
      for metadata storage

Please find more details in the image attached.

The thing I find confusing is that JVM2 (Hive Server) starts up a Derby 
database instance. I can see that from the files the JVM has opened.

Does anybody know, why the Hive Server needs a Derby instance even 
though hive-site.xml says: hive.metastore.local=false ?

Any hints are much appreciated.

Thanks,
Christian

btw,
I have not been able to access the picture on the wiki 
<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>. 
("Not permitted"; even though I have registered on the wiki)



Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Christian Kurz <cr...@gmx.de>.
Ashutosh,

thank you for the explanation. I have changed the setup from embedded 
Derby to stand-alone Derby. Neither the hiveserver nor the metastore 
service open any Derby files any longer and things are working fine.

Thanks again for your help,
Christian

On 26.08.2011 01:00, Ashutosh Chauhan wrote:
> Christian,
>
> Looks like its not possible to do the setup that you are looking for. 
> Problem arises since HiveServer extends HMSHandler directly instead of 
> accessing Metastore through HiveMetaStoreClient and because of this 
> metastore thrift interface is missed entirely. Hiveserver will contact 
> mysql directly and won't go through external metastore service as you 
> have in your diagram.  If you consider this as a blocker, please open 
> up a jira for more discussion.
>
> Hope it helps,
> Ashutosh
>
> On Wed, Aug 24, 2011 at 23:21, Christian Kurz <crkurz@gmx.de 
> <ma...@gmx.de>> wrote:
>
>     Thanks, Edward and Ashutosh
>
>     Ashutosh,
>     yes, I do not understand why the service "hiveserver" still uses a
>     Derby instance even through it should be talking to the service
>     "metastore". Btw, if I run the hiveserver without having started
>     the metastore service, the hiveserver complains when I try to let
>     it execute a HiveQL command through JDBC:
>
>     ...
>     org.apache.hadoop.hive.ql.metadata.HiveException:
>     MetaException(message:Could not connect to meta store using any of
>     the URIs provided)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
>     ...
>     (full stacktrace at the end of this post)
>
>     which is exactly what I expect and which makes me somewhat
>     confident that I have configured things correctly.
>
>     The entire issue came up, because the hiveserver service did not
>     work, when started from the same directory, from which the
>     metastore service had been started. It turned out that this was
>     because both services were trying to setup a Derby instance in the
>     current dir and therefore ran into a file locking situation. I
>     have worked around this by starting the two services from
>     different directories, but I am worried that I'd be missing an
>     important point in my setup.
>
>     When I run "pfiles <pid of hiveserver>" it lists these files for
>     the hiveserver service (which should not need a Derby instance, as
>     far as I understood):
>           ...tons of jars...
>           /home/hadoop/hive_admin/derby.log
>           /home/hadoop/hive_admin/metastore_db/log/log1.dat
>           /home/hadoop/hive_admin/metastore_db/dbex.lck
>           /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
>           /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
>           ...
>           /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
>           /home/hadoop/hive_admin/metastore_db/seg0/c451.dat
>
>     Any pointers appreciated. If anybody things this is a bug, I can
>     file one.
>
>     Thanks,
>     Christian
>
>
>     full stacktrace:
>
>     Hive history
>     file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
>     FAILED: Error in semantic analysis: Table not found weblog
>     org.apache.hadoop.hive.ql.metadata.HiveException:
>     MetaException(message:Could not connect to meta store using any of
>     the URIs provided)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
>             at
>     org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
>             at
>     org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>             at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>             at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>             at
>     org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
>             at
>     org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
>             at
>     org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
>             at
>     org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:619)
>     Caused by: MetaException(message:Could not connect to meta store
>     using any of the URIs provided)
>             at
>     org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
>             at
>     org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
>             at
>     org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
>             ... 13 more
>     FAILED: Error in metadata: MetaException(message:Could not connect
>     to meta store using any of the URIs provided)
>     FAILED: Execution Error, return code 1 from
>     org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
>     On 25.08.2011 01 <tel:25.08.2011%2001>:29, Ashutosh Chauhan wrote:
>>     Edward,
>>
>>     Apart from recommended best practices what Christian is asking
>>     for is why HiveServer is still trying to interact with local db
>>     instance even after setting the config variables. AFAIK it should
>>     not. Christian, you found that out by looking at files opened by
>>     HiveServer jvm. Can you provide more info there like how did you
>>     find that out and which these files are?
>>
>>     Ashutosh
>>
>>     On Wed, Aug 24, 2011 at 14:20, Edward Capriolo
>>     <edlinuxguru@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>
>>         On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz
>>         <crkurz@gmx.de <ma...@gmx.de>> wrote:
>>
>>
>>             Thanks for the quick reply, Edward
>>
>>             I am not sure I got you: My HiveService has been started
>>             with hive.metastore.local=false. So shouldn't it use
>>             thrift instead of its own local Derby instance?
>>
>>             Thanks,
>>             Christian
>>
>>             Am 24.08.2011 um 19:33 schrieb Edward Capriolo
>>             <edlinuxguru@gmail.com <ma...@gmail.com>>:
>>
>>>
>>>
>>>             On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz
>>>             <crkurz@gmx.de <ma...@gmx.de>> wrote:
>>>
>>>                 Greetings,
>>>
>>>                 could somebody confirm/correct my understanding of a
>>>                 fully distributed Hive setup, please?
>>>
>>>                 My setup is as follows
>>>
>>>                     * *Java application using Hive JDBC driver
>>>                       *connects to
>>>                     * *hive --service hiveserver*, which connects to
>>>                     * *hive --service metastore*, which uses an
>>>                       embedded Derby database for metadata storage
>>>
>>>                 Please find more details in the image attached.
>>>
>>>                 The thing I find confusing is that JVM2 (Hive
>>>                 Server) starts up a Derby database instance. I can
>>>                 see that from the files the JVM has opened.
>>>
>>>                 Does anybody know, why the Hive Server needs a Derby
>>>                 instance even though hive-site.xml says:
>>>                 hive.metastore.local=false ?
>>>
>>>                 Any hints are much appreciated.
>>>
>>>                 Thanks,
>>>                 Christian
>>>
>>>                 btw,
>>>                 I have not been able to access the picture on the
>>>                 wiki
>>>                 <https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
>>>                 ("Not permitted"; even though I have registered on
>>>                 the wiki)
>>>
>>>
>>>
>>>             hive.metastore.local is really misnamed.
>>>
>>>             local=true means communicate using datanucleus/JPOX and
>>>             talking directly to the metastore.
>>>
>>>             local=false means use thrift which is essentially a
>>>             level of indirection.
>>
>>         Talking about HiveService can confuse things because
>>         HiveService is a different thrift interface.
>>
>>         You could be setup like this:
>>         HiveServiceClient->HiveService->metastore.local=true->derby
>>         or
>>         HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
>>
>>         most people are setup like this:
>>
>>         HiveServiceClient->HiveService->metastore.local=true->mysql
>>         cli->metastore.local=true->mysql
>>
>>
>>
>

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Ashutosh Chauhan <ha...@apache.org>.
Christian,

Looks like its not possible to do the setup that you are looking for.
Problem arises since HiveServer extends HMSHandler directly instead of
accessing Metastore through HiveMetaStoreClient and because of this
metastore thrift interface is missed entirely. Hiveserver will contact mysql
directly and won't go through external metastore service as you have in your
diagram.  If you consider this as a blocker, please open up a jira for more
discussion.

Hope it helps,
Ashutosh

On Wed, Aug 24, 2011 at 23:21, Christian Kurz <cr...@gmx.de> wrote:

> **
> Thanks, Edward and Ashutosh
>
> Ashutosh,
> yes, I do not understand why the service "hiveserver" still uses a Derby
> instance even through it should be talking to the service "metastore". Btw,
> if I run the hiveserver without having started the metastore service, the
> hiveserver complains when I try to let it execute a HiveQL command through
> JDBC:
>
> ...
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:Could not connect to meta store using any of the URIs
> provided)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
> ...
> (full stacktrace at the end of this post)
>
> which is exactly what I expect and which makes me somewhat confident that I
> have configured things correctly.
>
> The entire issue came up, because the hiveserver service did not work, when
> started from the same directory, from which the metastore service had been
> started. It turned out that this was because both services were trying to
> setup a Derby instance in the current dir and therefore ran into a file
> locking situation. I have worked around this by starting the two services
> from different directories, but I am worried that I'd be missing an
> important point in my setup.
>
> When I run "pfiles <pid of hiveserver>" it lists these files for the
> hiveserver service (which should not need a Derby instance, as far as I
> understood):
>       ...tons of jars...
>       /home/hadoop/hive_admin/derby.log
>       /home/hadoop/hive_admin/metastore_db/log/log1.dat
>       /home/hadoop/hive_admin/metastore_db/dbex.lck
>       /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
>       /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
>       ...
>       /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
>       /home/hadoop/hive_admin/metastore_db/seg0/c451.dat
>
> Any pointers appreciated. If anybody things this is a bug, I can file one.
>
> Thanks,
> Christian
>
>
> full stacktrace:
>
> Hive history
> file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
> FAILED: Error in semantic analysis: Table not found weblog
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:Could not connect to meta store using any of the URIs
> provided)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>         at
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
>         at
> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
>         at
> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
>         at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: MetaException(message:Could not connect to meta store using any
> of the URIs provided)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
>         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
>         at
> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
>         ... 13 more
> FAILED: Error in metadata: MetaException(message:Could not connect to meta
> store using any of the URIs provided)
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
>
>
> On 25.08.2011 01:29, Ashutosh Chauhan wrote:
>
> Edward,
>
>  Apart from recommended best practices what Christian is asking for is why
> HiveServer is still trying to interact with local db instance even after
> setting the config variables. AFAIK it should not. Christian, you found that
> out by looking at files opened by HiveServer jvm. Can you provide more info
> there like how did you find that out and which these files are?
>
>  Ashutosh
>
> On Wed, Aug 24, 2011 at 14:20, Edward Capriolo <ed...@gmail.com>wrote:
>
>>
>>
>> On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz <cr...@gmx.de> wrote:
>>
>>>
>>> Thanks for the quick reply, Edward
>>>
>>>  I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
>>> So shouldn't it use thrift instead of its own local Derby instance?
>>>
>>>  Thanks,
>>> Christian
>>>
>>> Am 24.08.2011 um 19:33 schrieb Edward Capriolo <ed...@gmail.com>:
>>>
>>>
>>>
>>> On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:
>>>
>>>>  Greetings,
>>>>
>>>> could somebody confirm/correct my understanding of a fully distributed
>>>> Hive setup, please?
>>>>
>>>> My setup is as follows
>>>>
>>>>    - *Java application using Hive JDBC driver *connects to
>>>>     - *hive --service hiveserver*, which connects to
>>>>     - *hive --service metastore*, which uses an embedded Derby database
>>>>    for metadata storage
>>>>
>>>> Please find more details in the image attached.
>>>>
>>>> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
>>>> database instance. I can see that from the files the JVM has opened.
>>>>
>>>> Does anybody know, why the Hive Server needs a Derby instance even
>>>> though hive-site.xml says: hive.metastore.local=false ?
>>>>
>>>> Any hints are much appreciated.
>>>>
>>>> Thanks,
>>>> Christian
>>>>
>>>> btw,
>>>> I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
>>>> ("Not permitted"; even though I have registered on the wiki)
>>>>
>>>>
>>>>
>>>  hive.metastore.local is really misnamed.
>>>
>>>  local=true means communicate using datanucleus/JPOX and talking
>>> directly to the metastore.
>>>
>>>  local=false means use thrift which is essentially a level of
>>> indirection.
>>>
>>>    Talking about HiveService can confuse things because HiveService is a
>> different thrift interface.
>>
>>  You could be setup like this:
>>  HiveServiceClient->HiveService->metastore.local=true->derby
>> or
>>
>> HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
>>
>>  most people are setup like this:
>>
>>  HiveServiceClient->HiveService->metastore.local=true->mysql
>>  cli->metastore.local=true->mysql
>>
>>
>>
>

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Christian Kurz <cr...@gmx.de>.
Thanks, Edward and Ashutosh

Ashutosh,
yes, I do not understand why the service "hiveserver" still uses a Derby 
instance even through it should be talking to the service "metastore". 
Btw, if I run the hiveserver without having started the metastore 
service, the hiveserver complains when I try to let it execute a HiveQL 
command through JDBC:

...
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Could not connect to meta store using any of the 
URIs provided)
         at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
...
(full stacktrace at the end of this post)

which is exactly what I expect and which makes me somewhat confident 
that I have configured things correctly.

The entire issue came up, because the hiveserver service did not work, 
when started from the same directory, from which the metastore service 
had been started. It turned out that this was because both services were 
trying to setup a Derby instance in the current dir and therefore ran 
into a file locking situation. I have worked around this by starting the 
two services from different directories, but I am worried that I'd be 
missing an important point in my setup.

When I run "pfiles <pid of hiveserver>" it lists these files for the 
hiveserver service (which should not need a Derby instance, as far as I 
understood):
       ...tons of jars...
       /home/hadoop/hive_admin/derby.log
       /home/hadoop/hive_admin/metastore_db/log/log1.dat
       /home/hadoop/hive_admin/metastore_db/dbex.lck
       /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
       /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
       ...
       /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
       /home/hadoop/hive_admin/metastore_db/seg0/c451.dat

Any pointers appreciated. If anybody things this is a bug, I can file one.

Thanks,
Christian


full stacktrace:

Hive history file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
FAILED: Error in semantic analysis: Table not found weblog
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:Could not connect to meta store using any of the 
URIs provided)
         at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
         at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
         at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
         at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
         at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
         at 
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
         at 
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
         at 
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
         at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)
Caused by: MetaException(message:Could not connect to meta store using 
any of the URIs provided)
         at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
         at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
         at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
         at 
org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
         ... 13 more
FAILED: Error in metadata: MetaException(message:Could not connect to 
meta store using any of the URIs provided)
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask



On 25.08.2011 01:29, Ashutosh Chauhan wrote:
> Edward,
>
> Apart from recommended best practices what Christian is asking for is 
> why HiveServer is still trying to interact with local db instance even 
> after setting the config variables. AFAIK it should not. Christian, 
> you found that out by looking at files opened by HiveServer jvm. Can 
> you provide more info there like how did you find that out and which 
> these files are?
>
> Ashutosh
>
> On Wed, Aug 24, 2011 at 14:20, Edward Capriolo <edlinuxguru@gmail.com 
> <ma...@gmail.com>> wrote:
>
>
>
>     On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz <crkurz@gmx.de
>     <ma...@gmx.de>> wrote:
>
>
>         Thanks for the quick reply, Edward
>
>         I am not sure I got you: My HiveService has been started with
>         hive.metastore.local=false. So shouldn't it use thrift instead
>         of its own local Derby instance?
>
>         Thanks,
>         Christian
>
>         Am 24.08.2011 um 19:33 schrieb Edward Capriolo
>         <edlinuxguru@gmail.com <ma...@gmail.com>>:
>
>>
>>
>>         On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz
>>         <crkurz@gmx.de <ma...@gmx.de>> wrote:
>>
>>             Greetings,
>>
>>             could somebody confirm/correct my understanding of a
>>             fully distributed Hive setup, please?
>>
>>             My setup is as follows
>>
>>                 * *Java application using Hive JDBC driver *connects to
>>                 * *hive --service hiveserver*, which connects to
>>                 * *hive --service metastore*, which uses an embedded
>>                   Derby database for metadata storage
>>
>>             Please find more details in the image attached.
>>
>>             The thing I find confusing is that JVM2 (Hive Server)
>>             starts up a Derby database instance. I can see that from
>>             the files the JVM has opened.
>>
>>             Does anybody know, why the Hive Server needs a Derby
>>             instance even though hive-site.xml says:
>>             hive.metastore.local=false ?
>>
>>             Any hints are much appreciated.
>>
>>             Thanks,
>>             Christian
>>
>>             btw,
>>             I have not been able to access the picture on the wiki
>>             <https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
>>             ("Not permitted"; even though I have registered on the wiki)
>>
>>
>>
>>         hive.metastore.local is really misnamed.
>>
>>         local=true means communicate using datanucleus/JPOX and
>>         talking directly to the metastore.
>>
>>         local=false means use thrift which is essentially a level of
>>         indirection.
>
>     Talking about HiveService can confuse things because HiveService
>     is a different thrift interface.
>
>     You could be setup like this:
>     HiveServiceClient->HiveService->metastore.local=true->derby
>     or
>     HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
>
>     most people are setup like this:
>
>     HiveServiceClient->HiveService->metastore.local=true->mysql
>     cli->metastore.local=true->mysql
>
>
>

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Ashutosh Chauhan <ha...@apache.org>.
Edward,

Apart from recommended best practices what Christian is asking for is why
HiveServer is still trying to interact with local db instance even after
setting the config variables. AFAIK it should not. Christian, you found that
out by looking at files opened by HiveServer jvm. Can you provide more info
there like how did you find that out and which these files are?

Ashutosh

On Wed, Aug 24, 2011 at 14:20, Edward Capriolo <ed...@gmail.com>wrote:

>
>
> On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz <cr...@gmx.de> wrote:
>
>>
>> Thanks for the quick reply, Edward
>>
>> I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
>> So shouldn't it use thrift instead of its own local Derby instance?
>>
>> Thanks,
>> Christian
>>
>> Am 24.08.2011 um 19:33 schrieb Edward Capriolo <ed...@gmail.com>:
>>
>>
>>
>> On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:
>>
>>>  Greetings,
>>>
>>> could somebody confirm/correct my understanding of a fully distributed
>>> Hive setup, please?
>>>
>>> My setup is as follows
>>>
>>>    - *Java application using Hive JDBC driver *connects to
>>>     - *hive --service hiveserver*, which connects to
>>>     - *hive --service metastore*, which uses an embedded Derby database
>>>    for metadata storage
>>>
>>> Please find more details in the image attached.
>>>
>>> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
>>> database instance. I can see that from the files the JVM has opened.
>>>
>>> Does anybody know, why the Hive Server needs a Derby instance even though
>>> hive-site.xml says: hive.metastore.local=false ?
>>>
>>> Any hints are much appreciated.
>>>
>>> Thanks,
>>> Christian
>>>
>>> btw,
>>> I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
>>> ("Not permitted"; even though I have registered on the wiki)
>>>
>>>
>>>
>> hive.metastore.local is really misnamed.
>>
>> local=true means communicate using datanucleus/JPOX and talking directly
>> to the metastore.
>>
>> local=false means use thrift which is essentially a level of indirection.
>>
>> Talking about HiveService can confuse things because HiveService is a
> different thrift interface.
>
> You could be setup like this:
> HiveServiceClient->HiveService->metastore.local=true->derby
> or
>
> HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
>
> most people are setup like this:
>
> HiveServiceClient->HiveService->metastore.local=true->mysql
> cli->metastore.local=true->mysql
>
>
>

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz <cr...@gmx.de> wrote:

>
> Thanks for the quick reply, Edward
>
> I am not sure I got you: My HiveService has been started with hive.metastore.local=false.
> So shouldn't it use thrift instead of its own local Derby instance?
>
> Thanks,
> Christian
>
> Am 24.08.2011 um 19:33 schrieb Edward Capriolo <ed...@gmail.com>:
>
>
>
> On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:
>
>>  Greetings,
>>
>> could somebody confirm/correct my understanding of a fully distributed
>> Hive setup, please?
>>
>> My setup is as follows
>>
>>    - *Java application using Hive JDBC driver *connects to
>>     - *hive --service hiveserver*, which connects to
>>     - *hive --service metastore*, which uses an embedded Derby database
>>    for metadata storage
>>
>> Please find more details in the image attached.
>>
>> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
>> database instance. I can see that from the files the JVM has opened.
>>
>> Does anybody know, why the Hive Server needs a Derby instance even though
>> hive-site.xml says: hive.metastore.local=false ?
>>
>> Any hints are much appreciated.
>>
>> Thanks,
>> Christian
>>
>> btw,
>> I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
>> ("Not permitted"; even though I have registered on the wiki)
>>
>>
>>
> hive.metastore.local is really misnamed.
>
> local=true means communicate using datanucleus/JPOX and talking directly to
> the metastore.
>
> local=false means use thrift which is essentially a level of indirection.
>
> Talking about HiveService can confuse things because HiveService is a
different thrift interface.

You could be setup like this:
HiveServiceClient->HiveService->metastore.local=true->derby
or
HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore

most people are setup like this:

HiveServiceClient->HiveService->metastore.local=true->mysql
cli->metastore.local=true->mysql

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Christian Kurz <cr...@gmx.de>.
Thanks for the quick reply, Edward

I am not sure I got you: My HiveService has been started with hive.metastore.local=false. So shouldn't it use thrift instead of its own local Derby instance?

Thanks,
Christian

Am 24.08.2011 um 19:33 schrieb Edward Capriolo <ed...@gmail.com>:

> 
> 
> On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:
> Greetings,
> 
> could somebody confirm/correct my understanding of a fully distributed Hive setup, please?
> 
> My setup is as follows
> Java application using Hive JDBC driver connects to 
> hive --service hiveserver, which connects to
> hive --service metastore, which uses an embedded Derby database for metadata storage
> Please find more details in the image attached.
> 
> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby database instance. I can see that from the files the JVM has opened.
> 
> Does anybody know, why the Hive Server needs a Derby instance even though hive-site.xml says: hive.metastore.local=false ?
> 
> Any hints are much appreciated.
> 
> Thanks,
> Christian
> 
> btw, 
> I have not been able to access the picture on the wiki. ("Not permitted"; even though I have registered on the wiki)
> 
> 
> 
> hive.metastore.local is really misnamed. 
> 
> local=true means communicate using datanucleus/JPOX and talking directly to the metastore.
> 
> local=false means use thrift which is essentially a level of indirection. 

Re: Understanding distributed Hive server and Hive Metastore setup

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:

>  Greetings,
>
> could somebody confirm/correct my understanding of a fully distributed Hive
> setup, please?
>
> My setup is as follows
>
>    - *Java application using Hive JDBC driver *connects to
>     - *hive --service hiveserver*, which connects to
>     - *hive --service metastore*, which uses an embedded Derby database
>    for metadata storage
>
> Please find more details in the image attached.
>
> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
> database instance. I can see that from the files the JVM has opened.
>
> Does anybody know, why the Hive Server needs a Derby instance even though
> hive-site.xml says: hive.metastore.local=false ?
>
> Any hints are much appreciated.
>
> Thanks,
> Christian
>
> btw,
> I have not been able to access the picture on the wiki<https://cwiki.apache.org/Hive/adminmanual-metastoreadmin.html#AdminManualMetastoreAdmin-MetastoreDeploymentOptionsinPictures>.
> ("Not permitted"; even though I have registered on the wiki)
>
>
>
hive.metastore.local is really misnamed.

local=true means communicate using datanucleus/JPOX and talking directly to
the metastore.

local=false means use thrift which is essentially a level of indirection.