You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by karthik kottapalli <ka...@gmail.com> on 2011/08/26 01:26:08 UTC

Re:Issues Integrating HBASE with HIVE

0 down vote favorite
	

I am able to create tables in HIVE. I have a problem with integrating
HIVE and HBASE.

I am following this doc.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

My versions are: Hadoop 0.20.2 Hive 0.7.1 Hbase 0.20.6

hive> CREATE TABLE hbase_table_1(key int, value string)
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    > TBLPROPERTIES ("hbase.table.name" = "xyz");

console:

    java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.client.HBaseAdmin.(Lorg/apache/hadoop/conf/Configuration;)V
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.getHBaseAdmin(HBaseStorageHandler.java:74)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:158)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:344)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:470)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3146)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:213) at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
java.lang.reflect.Method.invoke(Unknown Source) at
org.apache.hadoop.util.RunJar.main(RunJar.java:156) FAILED: Execution
Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask

Any idea on how to proceed further or thoughts about the cause of the issue?


On Thu, Aug 25, 2011 at 6:00 PM, Ashutosh Chauhan <ha...@apache.org> wrote:
> Christian,
> Looks like its not possible to do the setup that you are looking for.
> Problem arises since HiveServer extends HMSHandler directly instead of
> accessing Metastore through HiveMetaStoreClient and because of this
> metastore thrift interface is missed entirely. Hiveserver will contact mysql
> directly and won't go through external metastore service as you have in your
> diagram.  If you consider this as a blocker, please open up a jira for more
> discussion.
> Hope it helps,
> Ashutosh
>
> On Wed, Aug 24, 2011 at 23:21, Christian Kurz <cr...@gmx.de> wrote:
>>
>> Thanks, Edward and Ashutosh
>>
>> Ashutosh,
>> yes, I do not understand why the service "hiveserver" still uses a Derby
>> instance even through it should be talking to the service "metastore". Btw,
>> if I run the hiveserver without having started the metastore service, the
>> hiveserver complains when I try to let it execute a HiveQL command through
>> JDBC:
>>
>> ...
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> MetaException(message:Could not connect to meta store using any of the URIs
>> provided)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
>> ...
>> (full stacktrace at the end of this post)
>>
>> which is exactly what I expect and which makes me somewhat confident that
>> I have configured things correctly.
>>
>> The entire issue came up, because the hiveserver service did not work,
>> when started from the same directory, from which the metastore service had
>> been started. It turned out that this was because both services were trying
>> to setup a Derby instance in the current dir and therefore ran into a file
>> locking situation. I have worked around this by starting the two services
>> from different directories, but I am worried that I'd be missing an
>> important point in my setup.
>>
>> When I run "pfiles <pid of hiveserver>" it lists these files for the
>> hiveserver service (which should not need a Derby instance, as far as I
>> understood):
>>       ...tons of jars...
>>       /home/hadoop/hive_admin/derby.log
>>       /home/hadoop/hive_admin/metastore_db/log/log1.dat
>>       /home/hadoop/hive_admin/metastore_db/dbex.lck
>>       /home/hadoop/hive_admin/metastore_db/seg0/c191.dat
>>       /home/hadoop/hive_admin/metastore_db/seg0/c1a1.dat
>>       ...
>>       /home/hadoop/hive_admin/metastore_db/seg0/c431.dat
>>       /home/hadoop/hive_admin/metastore_db/seg0/c451.dat
>>
>> Any pointers appreciated. If anybody things this is a bug, I can file one.
>>
>> Thanks,
>> Christian
>>
>>
>> full stacktrace:
>>
>> Hive history
>> file=/tmp/hadoop/hive_job_log_hadoop_201108242305_155100916.txt
>> FAILED: Error in semantic analysis: Table not found weblog
>> org.apache.hadoop.hive.ql.metadata.HiveException:
>> MetaException(message:Could not connect to meta store using any of the URIs
>> provided)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
>>         at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
>>         at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
>>         at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
>>         at
>> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:116)
>>         at
>> org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:699)
>>         at
>> org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:677)
>>         at
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:619)
>> Caused by: MetaException(message:Could not connect to meta store using any
>> of the URIs provided)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:183)
>>         at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:151)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1855)
>>         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1865)
>>         at
>> org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
>>         ... 13 more
>> FAILED: Error in metadata: MetaException(message:Could not connect to meta
>> store using any of the URIs provided)
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask
>>
>>
>>
>> On 25.08.2011 01:29, Ashutosh Chauhan wrote:
>>
>> Edward,
>> Apart from recommended best practices what Christian is asking for is why
>> HiveServer is still trying to interact with local db instance even after
>> setting the config variables. AFAIK it should not. Christian, you found that
>> out by looking at files opened by HiveServer jvm. Can you provide more info
>> there like how did you find that out and which these files are?
>> Ashutosh
>>
>> On Wed, Aug 24, 2011 at 14:20, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>>
>>>
>>> On Wed, Aug 24, 2011 at 3:02 PM, Christian Kurz <cr...@gmx.de> wrote:
>>>>
>>>> Thanks for the quick reply, Edward
>>>>
>>>> I am not sure I got you: My HiveService has been started
>>>> with hive.metastore.local=false. So shouldn't it use thrift instead of its
>>>> own local Derby instance?
>>>> Thanks,
>>>> Christian
>>>> Am 24.08.2011 um 19:33 schrieb Edward Capriolo <ed...@gmail.com>:
>>>>
>>>>
>>>>
>>>> On Wed, Aug 24, 2011 at 10:53 AM, Christian Kurz <cr...@gmx.de> wrote:
>>>>>
>>>>> Greetings,
>>>>>
>>>>> could somebody confirm/correct my understanding of a fully distributed
>>>>> Hive setup, please?
>>>>>
>>>>> My setup is as follows
>>>>>
>>>>> Java application using Hive JDBC driver connects to
>>>>> hive --service hiveserver, which connects to
>>>>> hive --service metastore, which uses an embedded Derby database for
>>>>> metadata storage
>>>>>
>>>>> Please find more details in the image attached.
>>>>>
>>>>> The thing I find confusing is that JVM2 (Hive Server) starts up a Derby
>>>>> database instance. I can see that from the files the JVM has opened.
>>>>>
>>>>> Does anybody know, why the Hive Server needs a Derby instance even
>>>>> though hive-site.xml says: hive.metastore.local=false ?
>>>>>
>>>>> Any hints are much appreciated.
>>>>>
>>>>> Thanks,
>>>>> Christian
>>>>>
>>>>> btw,
>>>>> I have not been able to access the picture on the wiki. ("Not
>>>>> permitted"; even though I have registered on the wiki)
>>>>>
>>>>>
>>>>
>>>> hive.metastore.local is really misnamed.
>>>>
>>>> local=true means communicate using datanucleus/JPOX and talking directly
>>>> to the metastore.
>>>> local=false means use thrift which is essentially a level of
>>>> indirection.
>>>
>>> Talking about HiveService can confuse things because HiveService is a
>>> different thrift interface.
>>> You could be setup like this:
>>> HiveServiceClient->HiveService->metastore.local=true->derby
>>> or
>>>
>>> HiveServiceClient->HiveService->metastore.local=false>thrift->hive_metastore
>>> most people are setup like this:
>>> HiveServiceClient->HiveService->metastore.local=true->mysql
>>> cli->metastore.local=true->mysql
>>>
>>
>
>