You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Stuart Smith <st...@yahoo.com> on 2010/11/19 01:46:55 UTC

Using jdbc in embedded mode - Can't find warehouse directory

Hello,

  I'm trying to connect to hive using the JDBC driver in embedded mode. I can load the driver successfully & connect to it via:

hiveConnection = DriverManager.getConnection( "jdbc:hive://", "", "" )

But when I query a table that I know exists - I can query it via a hive command line running on the same machine - I get a "table does not exist" error. When I go ahead and create the table in my java program, and then query it, I get: 

ERROR: hive.log java.io.FileNotFoundException: File file:/user/hive/warehouse/[table_name]
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
...

So it looks like it's trying to use the local filesystem for the warehouse dir. I tried setting the warehouse dir variable in the hive-default.xml file to:

hdfs://user/hive/warehouse/

But I get the same errors.

Any idea what's happening? 

Am I confused on what an embedded hive server can do - I was under the impression that the cli used an embedded hive server, and could connect to my hdfs store, but... it would seem my java program can't this.

I guess my next stop is going through the hive cli source code ?

Take care,
  -stu


      

Re: Using jdbc in embedded mode - Can't find warehouse directory [SOLVED]

Posted by Stuart Smith <st...@yahoo.com>.
Hello Shrijeet,

Yup. I already moved it over & it seems to work.
Moving it into hive conf was just a quick hack/test.

(the wiki has you setting hbase quorum vars via hive conf, so I assumed it wasn't *too* hackilicious).

It would be nice to have hadoop in the classpath covered on the wiki though, along with one more piece of information: 

For some odd reason, no mention is made of the hadoop.bin.path variable (a hive conf var). It's set via the HADOOP_HOME environment variable (which is mentioned on the wiki), in HiveConf.java.

Unfortunately, setting up env vars gets a little tricky in my tomcat test server running in eclipse :)

So I just set the hadoop.bin.path directly in the hive conf. Much easier then trying to muck around with environment variables.

I'd be glad to help with docs if anyone is game.

It could save someone the trouble ( fun ;) ) of grepping through source code, ripping out source from the cli client, creating a harness, tracking down code flow, etc... all to just understand how configuration variables are being used, and how to set them correctly...

At least it's on the mailing list now, I suppose..

Take care,
   -stu

--- On Fri, 11/19/10, Shrijeet Paliwal <sh...@rocketfuel.com> wrote:

> From: Shrijeet Paliwal <sh...@rocketfuel.com>
> Subject: Re: Using jdbc in embedded mode - Can't find warehouse directory [SOLVED]
> To: user@hive.apache.org
> Date: Friday, November 19, 2010, 8:30 PM
> I would say your hadoop configuration
> file(s) should have been your
> class-path (core-site.xml in this case) . You are not
> supposed to put
> hadoop parameters into hive conf files.
> 
> -Shrijeet
> 
> 
> On Fri, Nov 19, 2010 at 4:57 PM, Stuart Smith <st...@yahoo.com>
> wrote:
> >
> > Hello,
> >
> >  Just wanted to let people know I tracked this one
> down:
> >
> > It looks like it was not picking up the *hadoop*
> core-site.xml configuration file.
> >
> > - So the variable fs.default.name was never set
> >
> > - So the warehouse dir became
> file://[hive.metastore.warehouse.dir] instead of [hdfs
> location]/[hive.metastore.warehouse.dir]
> >
> > - So it couldn't find any of the warehouse files.
> >
> > - So the metastore queries would start to work, but
> the metastore couldn't find any of the backing files on
> hdfs.
> >
> > It was picking up the hive configuration, so I just
> plopped the fs.default.name property from hdfs-site.xml into
> the hive configuration.
> >
> > Should the jdbc wiki:
> >
> > http://wiki.apache.org/hadoop/Hive/HiveClient#head-fd2d8ae9e17fdc3d9b7048d088b2c23a53a6857d
> >
> > Be updated to include this information?
> >
> > It could be useful to anyone trying to use an embedded
> server (vs the example given). I would actually think this
> would apply to the standalone case as well, but I haven't
> tried it yet.
> >
> > My particular use case is using the jdbc connector in
> a java servlet (specifically, a GWT server-side RPC
> implementation).
> >
> > As an aside: is the hive jdbc connector thread-safe?
> > Assuming I instantiate within the callback method?
> > (I would think having a class Connection member would
> not be thread safe?).
> >
> > I'd be happy to help update the wiki & come up
> with an example, if that would help..
> >
> > Take care,
> >  -stu
> >
> >
> > --- On Thu, 11/18/10, Stuart Smith <st...@yahoo.com>
> wrote:
> >
> >> From: Stuart Smith <st...@yahoo.com>
> >> Subject: Using jdbc in embedded mode - Can't find
> warehouse directory
> >> To: user@hive.apache.org
> >> Date: Thursday, November 18, 2010, 7:46 PM
> >>
> >> Hello,
> >>
> >>   I'm trying to connect to hive using the JDBC
> driver
> >> in embedded mode. I can load the driver
> successfully &
> >> connect to it via:
> >>
> >> hiveConnection = DriverManager.getConnection(
> >> "jdbc:hive://", "", "" )
> >>
> >> But when I query a table that I know exists - I
> can query
> >> it via a hive command line running on the same
> machine - I
> >> get a "table does not exist" error. When I go
> ahead and
> >> create the table in my java program, and then
> query it, I
> >> get:
> >>
> >> ERROR: hive.log java.io.FileNotFoundException:
> File
> >> file:/user/hive/warehouse/[table_name]
> >>         at
> >>
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> >> ...
> >>
> >> So it looks like it's trying to use the local
> filesystem
> >> for the warehouse dir. I tried setting the
> warehouse dir
> >> variable in the hive-default.xml file to:
> >>
> >> hdfs://user/hive/warehouse/
> >>
> >> But I get the same errors.
> >>
> >> Any idea what's happening?
> >>
> >> Am I confused on what an embedded hive server can
> do - I
> >> was under the impression that the cli used an
> embedded hive
> >> server, and could connect to my hdfs store, but...
> it would
> >> seem my java program can't this.
> >>
> >> I guess my next stop is going through the hive cli
> source
> >> code ?
> >>
> >> Take care,
> >>   -stu
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> 


      

Re: Using jdbc in embedded mode - Can't find warehouse directory [SOLVED]

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
I would say your hadoop configuration file(s) should have been your
class-path (core-site.xml in this case) . You are not supposed to put
hadoop parameters into hive conf files.

-Shrijeet


On Fri, Nov 19, 2010 at 4:57 PM, Stuart Smith <st...@yahoo.com> wrote:
>
> Hello,
>
>  Just wanted to let people know I tracked this one down:
>
> It looks like it was not picking up the *hadoop* core-site.xml configuration file.
>
> - So the variable fs.default.name was never set
>
> - So the warehouse dir became file://[hive.metastore.warehouse.dir] instead of [hdfs location]/[hive.metastore.warehouse.dir]
>
> - So it couldn't find any of the warehouse files.
>
> - So the metastore queries would start to work, but the metastore couldn't find any of the backing files on hdfs.
>
> It was picking up the hive configuration, so I just plopped the fs.default.name property from hdfs-site.xml into the hive configuration.
>
> Should the jdbc wiki:
>
> http://wiki.apache.org/hadoop/Hive/HiveClient#head-fd2d8ae9e17fdc3d9b7048d088b2c23a53a6857d
>
> Be updated to include this information?
>
> It could be useful to anyone trying to use an embedded server (vs the example given). I would actually think this would apply to the standalone case as well, but I haven't tried it yet.
>
> My particular use case is using the jdbc connector in a java servlet (specifically, a GWT server-side RPC implementation).
>
> As an aside: is the hive jdbc connector thread-safe?
> Assuming I instantiate within the callback method?
> (I would think having a class Connection member would not be thread safe?).
>
> I'd be happy to help update the wiki & come up with an example, if that would help..
>
> Take care,
>  -stu
>
>
> --- On Thu, 11/18/10, Stuart Smith <st...@yahoo.com> wrote:
>
>> From: Stuart Smith <st...@yahoo.com>
>> Subject: Using jdbc in embedded mode - Can't find warehouse directory
>> To: user@hive.apache.org
>> Date: Thursday, November 18, 2010, 7:46 PM
>>
>> Hello,
>>
>>   I'm trying to connect to hive using the JDBC driver
>> in embedded mode. I can load the driver successfully &
>> connect to it via:
>>
>> hiveConnection = DriverManager.getConnection(
>> "jdbc:hive://", "", "" )
>>
>> But when I query a table that I know exists - I can query
>> it via a hive command line running on the same machine - I
>> get a "table does not exist" error. When I go ahead and
>> create the table in my java program, and then query it, I
>> get:
>>
>> ERROR: hive.log java.io.FileNotFoundException: File
>> file:/user/hive/warehouse/[table_name]
>>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
>> ...
>>
>> So it looks like it's trying to use the local filesystem
>> for the warehouse dir. I tried setting the warehouse dir
>> variable in the hive-default.xml file to:
>>
>> hdfs://user/hive/warehouse/
>>
>> But I get the same errors.
>>
>> Any idea what's happening?
>>
>> Am I confused on what an embedded hive server can do - I
>> was under the impression that the cli used an embedded hive
>> server, and could connect to my hdfs store, but... it would
>> seem my java program can't this.
>>
>> I guess my next stop is going through the hive cli source
>> code ?
>>
>> Take care,
>>   -stu
>>
>>
>>
>>
>
>
>
>

Re: Using jdbc in embedded mode - Can't find warehouse directory [SOLVED]

Posted by Stuart Smith <st...@yahoo.com>.
Hello,

  Just wanted to let people know I tracked this one down:

It looks like it was not picking up the *hadoop* core-site.xml configuration file. 

- So the variable fs.default.name was never set

- So the warehouse dir became file://[hive.metastore.warehouse.dir] instead of [hdfs location]/[hive.metastore.warehouse.dir]

- So it couldn't find any of the warehouse files.

- So the metastore queries would start to work, but the metastore couldn't find any of the backing files on hdfs.

It was picking up the hive configuration, so I just plopped the fs.default.name property from hdfs-site.xml into the hive configuration.

Should the jdbc wiki:

http://wiki.apache.org/hadoop/Hive/HiveClient#head-fd2d8ae9e17fdc3d9b7048d088b2c23a53a6857d

Be updated to include this information?

It could be useful to anyone trying to use an embedded server (vs the example given). I would actually think this would apply to the standalone case as well, but I haven't tried it yet.

My particular use case is using the jdbc connector in a java servlet (specifically, a GWT server-side RPC implementation).

As an aside: is the hive jdbc connector thread-safe? 
Assuming I instantiate within the callback method?
(I would think having a class Connection member would not be thread safe?).

I'd be happy to help update the wiki & come up with an example, if that would help..

Take care,
  -stu


--- On Thu, 11/18/10, Stuart Smith <st...@yahoo.com> wrote:

> From: Stuart Smith <st...@yahoo.com>
> Subject: Using jdbc in embedded mode - Can't find warehouse directory
> To: user@hive.apache.org
> Date: Thursday, November 18, 2010, 7:46 PM
> 
> Hello,
> 
>   I'm trying to connect to hive using the JDBC driver
> in embedded mode. I can load the driver successfully &
> connect to it via:
> 
> hiveConnection = DriverManager.getConnection(
> "jdbc:hive://", "", "" )
> 
> But when I query a table that I know exists - I can query
> it via a hive command line running on the same machine - I
> get a "table does not exist" error. When I go ahead and
> create the table in my java program, and then query it, I
> get: 
> 
> ERROR: hive.log java.io.FileNotFoundException: File
> file:/user/hive/warehouse/[table_name]
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> ...
> 
> So it looks like it's trying to use the local filesystem
> for the warehouse dir. I tried setting the warehouse dir
> variable in the hive-default.xml file to:
> 
> hdfs://user/hive/warehouse/
> 
> But I get the same errors.
> 
> Any idea what's happening? 
> 
> Am I confused on what an embedded hive server can do - I
> was under the impression that the cli used an embedded hive
> server, and could connect to my hdfs store, but... it would
> seem my java program can't this.
> 
> I guess my next stop is going through the hive cli source
> code ?
> 
> Take care,
>   -stu
> 
> 
>       
>