You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by nikroy16 <ni...@gmail.com> on 2014/07/29 06:51:28 UTC

HiveContext is creating metastore warehouse locally instead of in hdfs

Hi,

Even though hive.metastore.warehouse.dir in hive-site.xml is set to the
default user/hive/warehouse and the permissions are correct in hdfs,
HiveContext seems to be creating metastore locally instead of hdfs. After
looking into the spark code, I found the following in HiveContext.scala:

   /**
* SQLConf and HiveConf contracts: when the hive session is first
initialized, params in


* HiveConf will get picked up by the SQLConf. Additionally, any properties
set by


* set() or a SET command inside hql() or sql() will be set in the SQLConf
*as well as*


* in the HiveConf.
*/
  @transient protected[hive] lazy val hiveconf = new
HiveConf(classOf[SessionState])


  @transient protected[hive] lazy val sessionState = {


    val ss = new SessionState(hiveconf)


    set(hiveconf.getAllProperties) // Have SQLConf pick up the initial set
of HiveConf.


    ss
  }


It seems as though when a HiveContext is created, it is launched without any
configuration and hive-site.xml is not used to set properties. It looks like
I can set properties after creation by using hql() method but what I am
looking for is for the hive context to be initialized according to the
configuration in hive-site.xml at the time of initialization. Any help would
be greatly appreciated!





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by chenjie <ch...@gmail.com>.

I used the web ui of spark and could see the conf directory is in CLASSPATH.
An abnormal thing is that when start spark-shell I always get the following
info:
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

At first, I think it's because the hadoop version is not compatible with the
pre-built spark. My hadoop version is 2.4.1 and the pre-built spark is built
against hadoop 2.2.0. Then, I built spark from src against hadoop 2.4.1.
However, I still got the info above.

Besides, when I set log4j.rootCategory to DEBUG, I got an exception which
said "HADOOP_HOME or hadoop.home.dir are not set" despite I have set
HADOOP_HOME.



alee526 wrote
> Could you enable HistoryServer and provide the properties and CLASSPATH
> for the spark-shell? And 'env' command to list your environment variables?
> 
> By the way, what does the spark logs says? Enable debug mode to see what's
> going on in spark-shell when it tries to interact and init HiveContext.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11147.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by Andrew Lee <al...@hotmail.com>.

Could you enable HistoryServer and provide the properties and CLASSPATH for the spark-shell? And 'env' command to list your environment variables?

By the way, what does the spark logs says? Enable debug mode to see what's going on in spark-shell when it tries to interact and init HiveContext.



> On Jul 31, 2014, at 19:09, "chenjie" <ch...@gmail.com> wrote:
> 
> Hi, Yin and Andrew, thank you for your reply.
> When I create table in hive cli, it works correctly and the table will be
> found in hdfs. I forgot start hiveserver2 before and I started it today.
> Then I run the command below:
>    spark-shell --master spark://192.168.40.164:7077  --driver-class-path
> conf/hive-site.xml
> Furthermore, I added the following command:
>    hiveContext.hql("SET
> hive.metastore.warehouse.dir=hdfs://192.168.40.164:8020/user/hive/warehouse")
> But then didn't work for me. I got the same exception as before and found
> the table file in local directory instead of hdfs.
> 
> 
> Yin Huai-2 wrote
>> Another way is to set "hive.metastore.warehouse.dir" explicitly to the
>> HDFS
>> dir storing Hive tables by using SET command. For example:
>> 
>> hiveContext.hql("SET
>> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse")
>> 
>> 
>> 
>> 
>> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee &lt;
> 
>> alee526@
> 
>> &gt; wrote:
>> 
>>> Hi All,
>>> 
>>> It has been awhile, but what I did to make it work is to make sure the
>>> followings:
>>> 
>>> 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
>>> 
>>> 2. Make sure you have the hive-site.xml from above Hive configuration.
>>> The
>>> problem here is that you want the hive-site.xml from the Hive metastore.
>>> The one for Hive and HCatalog may be different files. Make sure you check
>>> the xml properties in that file, pick the one that has the warehouse
>>> property configured and the JDO setup.
>>> 
>>> 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf,
>>> and in your runtime CLASSPATH when you run spark-shell
>>> 
>>> 4. Use the history server to check the runtime CLASSPATH and order to
>>> ensure hive-site.xml is included.
>>> 
>>> HiveContext should pick up the hive-site.xml and talk to your running
>>> hive
>>> service.
>>> 
>>> Hope these tips help.
>>> 
>>>> On Jul 30, 2014, at 22:47, "chenjie" &lt;
> 
>> chenjie2001@
> 
>> &gt; wrote:
>>>> 
>>>> Hi, Michael. I Have the same problem. My warehouse directory is always
>>>> created locally. I copied the default hive-site.xml into the
>>>> $SPARK_HOME/conf directory on each node. After I executed the code
>>> below,
>>>>   val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>>   hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
>>>> STRING)")
>>>>   hiveContext.hql("LOAD DATA LOCAL INPATH
>>>> '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE
>>> src")
>>>>   hiveContext.hql("FROM src SELECT key, value").collect()
>>>> 
>>>> I got the exception below:
>>>> java.io.FileNotFoundException: File
>>> file:/user/hive/warehouse/src/kv1.txt
>>>> does not exist
>>>>   at
>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>>>>   at
>>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>>>>   at
>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.
>> <init>
>> (ChecksumFileSystem.java:137)
>>>>   at
>>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>>>>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
>>>>   at
>>> org.apache.hadoop.mapred.LineRecordReader.
>> <init>
>> (LineRecordReader.java:106)
>>>>   at
>>> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>>>>   at org.apache.spark.rdd.HadoopRDD$$anon$1.
>> <init>
>> (HadoopRDD.scala:193)
>>>> 
>>>> At last, I found /user/hive/warehouse/src/kv1.txt was created on the
>>> node
>>>> where I start spark-shell.
>>>> 
>>>> The spark that I used is pre-built spark1.0.1 for hadoop2.
>>>> 
>>>> Thanks in advance.
> 
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11111.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by chenjie <ch...@gmail.com>.

Hi, Yin and Andrew, thank you for your reply.
When I create table in hive cli, it works correctly and the table will be
found in hdfs. I forgot start hiveserver2 before and I started it today.
Then I run the command below:
    spark-shell --master spark://192.168.40.164:7077  --driver-class-path
conf/hive-site.xml
Furthermore, I added the following command:
    hiveContext.hql("SET
hive.metastore.warehouse.dir=hdfs://192.168.40.164:8020/user/hive/warehouse")
But then didn't work for me. I got the same exception as before and found
the table file in local directory instead of hdfs.


Yin Huai-2 wrote
> Another way is to set "hive.metastore.warehouse.dir" explicitly to the
> HDFS
> dir storing Hive tables by using SET command. For example:
> 
> hiveContext.hql("SET
> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse")
> 
> 
> 
> 
> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee &lt;

> alee526@

> &gt; wrote:
> 
>> Hi All,
>>
>> It has been awhile, but what I did to make it work is to make sure the
>> followings:
>>
>> 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
>>
>> 2. Make sure you have the hive-site.xml from above Hive configuration.
>> The
>> problem here is that you want the hive-site.xml from the Hive metastore.
>> The one for Hive and HCatalog may be different files. Make sure you check
>> the xml properties in that file, pick the one that has the warehouse
>> property configured and the JDO setup.
>>
>> 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf,
>> and in your runtime CLASSPATH when you run spark-shell
>>
>> 4. Use the history server to check the runtime CLASSPATH and order to
>> ensure hive-site.xml is included.
>>
>> HiveContext should pick up the hive-site.xml and talk to your running
>> hive
>> service.
>>
>> Hope these tips help.
>>
>> > On Jul 30, 2014, at 22:47, "chenjie" &lt;

> chenjie2001@

> &gt; wrote:
>> >
>> > Hi, Michael. I Have the same problem. My warehouse directory is always
>> > created locally. I copied the default hive-site.xml into the
>> > $SPARK_HOME/conf directory on each node. After I executed the code
>> below,
>> >    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> >    hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
>> > STRING)")
>> >    hiveContext.hql("LOAD DATA LOCAL INPATH
>> > '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE
>> src")
>> >    hiveContext.hql("FROM src SELECT key, value").collect()
>> >
>> > I got the exception below:
>> > java.io.FileNotFoundException: File
>> file:/user/hive/warehouse/src/kv1.txt
>> > does not exist
>> >    at
>> >
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>> >    at
>> >
>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.
> <init>
> (ChecksumFileSystem.java:137)
>> >    at
>> >
>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>> >    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
>> >    at
>> >
>> org.apache.hadoop.mapred.LineRecordReader.
> <init>
> (LineRecordReader.java:106)
>> >    at
>> >
>> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>> >    at org.apache.spark.rdd.HadoopRDD$$anon$1.
> <init>
> (HadoopRDD.scala:193)
>> >
>> > At last, I found /user/hive/warehouse/src/kv1.txt was created on the
>> node
>> > where I start spark-shell.
>> >
>> > The spark that I used is pre-built spark1.0.1 for hadoop2.
>> >
>> > Thanks in advance.
>> >
>> >





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11111.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by Yin Huai <yh...@databricks.com>.

Another way is to set "hive.metastore.warehouse.dir" explicitly to the HDFS
dir storing Hive tables by using SET command. For example:

hiveContext.hql("SET
hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse")




On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee <al...@hotmail.com> wrote:

> Hi All,
>
> It has been awhile, but what I did to make it work is to make sure the
> followings:
>
> 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
>
> 2. Make sure you have the hive-site.xml from above Hive configuration. The
> problem here is that you want the hive-site.xml from the Hive metastore.
> The one for Hive and HCatalog may be different files. Make sure you check
> the xml properties in that file, pick the one that has the warehouse
> property configured and the JDO setup.
>
> 3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf,
> and in your runtime CLASSPATH when you run spark-shell
>
> 4. Use the history server to check the runtime CLASSPATH and order to
> ensure hive-site.xml is included.
>
> HiveContext should pick up the hive-site.xml and talk to your running hive
> service.
>
> Hope these tips help.
>
> > On Jul 30, 2014, at 22:47, "chenjie" <ch...@gmail.com> wrote:
> >
> > Hi, Michael. I Have the same problem. My warehouse directory is always
> > created locally. I copied the default hive-site.xml into the
> > $SPARK_HOME/conf directory on each node. After I executed the code below,
> >    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> >    hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
> > STRING)")
> >    hiveContext.hql("LOAD DATA LOCAL INPATH
> > '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE
> src")
> >    hiveContext.hql("FROM src SELECT key, value").collect()
> >
> > I got the exception below:
> > java.io.FileNotFoundException: File file:/user/hive/warehouse/src/kv1.txt
> > does not exist
> >    at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
> >    at
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> >    at
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
> >    at
> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> >    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
> >    at
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:106)
> >    at
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
> >    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)
> >
> > At last, I found /user/hive/warehouse/src/kv1.txt was created on the node
> > where I start spark-shell.
> >
> > The spark that I used is pre-built spark1.0.1 for hadoop2.
> >
> > Thanks in advance.
> >
> >
> > Michael Armbrust wrote
> >> The warehouse and the metastore directories are two different things.
>  The
> >> metastore holds the schema information about the tables and will by
> >> default
> >> be a local directory.  With javax.jdo.option.ConnectionURL you can
> >> configure it to be something like mysql.  The warehouse directory is the
> >> default location where the actual contents of the tables is stored.
>  What
> >> directory are seeing created locally?
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11024.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by Andrew Lee <al...@hotmail.com>.

Hi All,

It has been awhile, but what I did to make it work is to make sure the followings:

1. Hive is working when you run Hive CLI and JDBC via Hiveserver2

2. Make sure you have the hive-site.xml from above Hive configuration. The problem here is that you want the hive-site.xml from the Hive metastore. The one for Hive and HCatalog may be different files. Make sure you check the xml properties in that file, pick the one that has the warehouse property configured and the JDO setup.

3. Make sure hive-site.xml from step 2 is included in $SPARK_HOME/conf, and in your runtime CLASSPATH when you run spark-shell

4. Use the history server to check the runtime CLASSPATH and order to ensure hive-site.xml is included.

HiveContext should pick up the hive-site.xml and talk to your running hive service.

Hope these tips help.

> On Jul 30, 2014, at 22:47, "chenjie" <ch...@gmail.com> wrote:
> 
> Hi, Michael. I Have the same problem. My warehouse directory is always
> created locally. I copied the default hive-site.xml into the
> $SPARK_HOME/conf directory on each node. After I executed the code below,
>    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>    hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
> STRING)")
>    hiveContext.hql("LOAD DATA LOCAL INPATH
> '/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE src")
>    hiveContext.hql("FROM src SELECT key, value").collect()
> 
> I got the exception below:
> java.io.FileNotFoundException: File file:/user/hive/warehouse/src/kv1.txt
> does not exist
>    at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>    at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
>    at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
>    at
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:106)
>    at
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)
> 
> At last, I found /user/hive/warehouse/src/kv1.txt was created on the node
> where I start spark-shell.
> 
> The spark that I used is pre-built spark1.0.1 for hadoop2.
> 
> Thanks in advance.
> 
> 
> Michael Armbrust wrote
>> The warehouse and the metastore directories are two different things.  The
>> metastore holds the schema information about the tables and will by
>> default
>> be a local directory.  With javax.jdo.option.ConnectionURL you can
>> configure it to be something like mysql.  The warehouse directory is the
>> default location where the actual contents of the tables is stored.  What
>> directory are seeing created locally?
> 
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11024.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by chenjie <ch...@gmail.com>.

Hi, Michael. I Have the same problem. My warehouse directory is always
created locally. I copied the default hive-site.xml into the
$SPARK_HOME/conf directory on each node. After I executed the code below,
    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value
STRING)")
    hiveContext.hql("LOAD DATA LOCAL INPATH
'/extdisk2/tools/spark/examples/src/main/resources/kv1.txt' INTO TABLE src")
    hiveContext.hql("FROM src SELECT key, value").collect()

I got the exception below:
java.io.FileNotFoundException: File file:/user/hive/warehouse/src/kv1.txt
does not exist
	at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
	at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
	at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
	at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
	at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:106)
	at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:193)

At last, I found /user/hive/warehouse/src/kv1.txt was created on the node
where I start spark-shell.

The spark that I used is pre-built spark1.0.1 for hadoop2.

Thanks in advance.


Michael Armbrust wrote
> The warehouse and the metastore directories are two different things.  The
> metastore holds the schema information about the tables and will by
> default
> be a local directory.  With javax.jdo.option.ConnectionURL you can
> configure it to be something like mysql.  The warehouse directory is the
> default location where the actual contents of the tables is stored.  What
> directory are seeing created locally?





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p11024.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by Michael Armbrust <mi...@databricks.com>.

The warehouse and the metastore directories are two different things.  The
metastore holds the schema information about the tables and will by default
be a local directory.  With javax.jdo.option.ConnectionURL you can
configure it to be something like mysql.  The warehouse directory is the
default location where the actual contents of the tables is stored.  What
directory are seeing created locally?

On Tue, Jul 29, 2014 at 10:49 AM, nikroy16 <ni...@gmail.com> wrote:

> Thanks for the response... hive-site.xml is in the classpath so that
> doesn't
> seem to be the issue.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

RE: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by nikroy16 <ni...@gmail.com>.

Thanks for the response... hive-site.xml is in the classpath so that doesn't
seem to be the issue.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838p10871.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: HiveContext is creating metastore warehouse locally instead of in hdfs

Posted by "Cheng, Hao" <ha...@intel.com>.

I ran this before, actually the hive-site.xml works in this way for me (the tricky happens in the new HiveConf(classOf[SessionState]), can you double check if hive-site.xml can be loaded in the class path? It supposes to appear in the root of the class path.

-----Original Message-----
From: nikroy16 [mailto:nikroy16@gmail.com] 
Sent: Tuesday, July 29, 2014 12:51 PM
To: user@spark.incubator.apache.org
Subject: HiveContext is creating metastore warehouse locally instead of in hdfs

Hi,

Even though hive.metastore.warehouse.dir in hive-site.xml is set to the default user/hive/warehouse and the permissions are correct in hdfs, HiveContext seems to be creating metastore locally instead of hdfs. After looking into the spark code, I found the following in HiveContext.scala:

   /**
* SQLConf and HiveConf contracts: when the hive session is first initialized, params in

* HiveConf will get picked up by the SQLConf. Additionally, any properties set by

* set() or a SET command inside hql() or sql() will be set in the SQLConf *as well as*

* in the HiveConf.
*/
  @transient protected[hive] lazy val hiveconf = new
HiveConf(classOf[SessionState])

  @transient protected[hive] lazy val sessionState = {

    val ss = new SessionState(hiveconf)

    set(hiveconf.getAllProperties) // Have SQLConf pick up the initial set of HiveConf.

    ss
  }

It seems as though when a HiveContext is created, it is launched without any configuration and hive-site.xml is not used to set properties. It looks like I can set properties after creation by using hql() method but what I am looking for is for the hive context to be initialized according to the configuration in hive-site.xml at the time of initialization. Any help would be greatly appreciated!

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HiveContext-is-creating-metastore-warehouse-locally-instead-of-in-hdfs-tp10838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.