You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Gerard Maas <ge...@gmail.com> on 2016/05/26 09:28:18 UTC

HiveContext standalone => without a Hive metastore

Hi,

I'm helping some folks setting up an analytics cluster with  Spark.
They want to use the HiveContext to enable the Window functions on
DataFrames(*) but they don't have any Hive installation, nor they need one
at the moment (if not necessary for this feature)

When we try to create a Hive context, we get the following error:

> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)

java.lang.RuntimeException: java.lang.RuntimeException: Unable to
instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

       at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive
Metastore?

Is there  a way to instantiate a HiveContext for the sake of Window support
without an underlying Hive deployment?

The docs are explicit in saying that that is should be the case: [1]

"To use a HiveContext, you do not need to have an existing Hive setup, and
all of the data sources available to aSQLContext are still available.
HiveContext is only packaged separately to avoid including all of Hive’s
dependencies in the default Spark build."

So what is the right way to address this issue? How to instantiate a
HiveContext with spark running on a HDFS cluster without Hive deployed?


Thanks a lot!

-Gerard.

(*) The need for a HiveContext to use Window functions is pretty obscure.
The only documentation of this seems to be a runtime exception: "
org.apache.spark.sql.AnalysisException: Could not resolve window function
'max'. Note that, using window functions currently requires a HiveContext;"


[1]
http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started

Re: HiveContext standalone => without a Hive metastore

Posted by Gerard Maas <ge...@gmail.com>.

Michael,  Mitch, Silvio,

Thanks!

The own directoy is the issue. We are running the Spark Notebook, which
uses the same dir per server (i.e. for all notebooks). So this issue
prevents us from running 2 notebooks using HiveContext.
I'll look in a proper Hive installation and I'm glad to know that this
dependency is gone in 2.0
Look forward to 2.1 :-) ;-)

-kr, Gerard.


On Thu, May 26, 2016 at 10:55 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> You can also just make sure that each user is using their own directory.
> A rough example can be found in TestHive.
>
> Note: in Spark 2.0 there should be no need to use HiveContext unless you
> need to talk to a metastore.
>
> On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Well make sure than you set up a reasonable RDBMS as metastore. Ours is
>> Oracle but you can get away with others. Check the supported list in
>>
>> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
>> total 40
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
>> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
>> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
>>
>> you have few good ones in the list.  In general the base tables (without
>> transactional support) are around 55  (Hive 2) and don't take much space
>> (depending on the volume of tables). I attached a E-R diagram.
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 26 May 2016 at 19:09, Gerard Maas <ge...@gmail.com> wrote:
>>
>>> Thanks a lot for the advice!.
>>>
>>> I found out why the standalone hiveContext would not work:  it was
>>> trying to deploy a derby db and the user had no rights to create the dir
>>> where there db is stored:
>>>
>>> Caused by: java.sql.SQLException: Failed to create database
>>> 'metastore_db', see the next exception for details.
>>>
>>>        at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>>> Source)
>>>
>>>        at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>>> Source)
>>>
>>>        ... 129 more
>>>
>>> Caused by: java.sql.SQLException: Directory
>>> /usr/share/spark-notebook/metastore_db cannot be created.
>>>
>>>
>>> Now, the new issue is that we can't start more than 1 context at the
>>> same time. I think we will need to setup a proper metastore.
>>>
>>>
>>> -kind regards, Gerard.
>>>
>>>
>>>
>>>
>>> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> To use HiveContext witch is basically an sql api within Spark without
>>>> proper hive set up does not make sense. It is a super set of Spark
>>>> SQLContext
>>>>
>>>> In addition simple things like registerTempTable may not work.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 26 May 2016 at 13:01, Silvio Fiorito <si...@granturing.com>
>>>> wrote:
>>>>
>>>>> Hi Gerard,
>>>>>
>>>>>
>>>>>
>>>>> I’ve never had an issue using the HiveContext without a hive-site.xml
>>>>> configured. However, one issue you may have is if multiple users are
>>>>> starting the HiveContext from the same path, they’ll all be trying to store
>>>>> the default Derby metastore in the same location. Also, if you want them to
>>>>> be able to persist permanent table metadata for SparkSQL then you’ll want
>>>>> to set up a true metastore.
>>>>>
>>>>>
>>>>>
>>>>> The other thing it could be is Hive dependency collisions from the
>>>>> classpath, but that shouldn’t be an issue since you said it’s standalone
>>>>> (not a Hadoop distro right?).
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Silvio
>>>>>
>>>>>
>>>>>
>>>>> *From: *Gerard Maas <ge...@gmail.com>
>>>>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>>>>> *To: *spark users <us...@spark.apache.org>
>>>>> *Subject: *HiveContext standalone => without a Hive metastore
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I'm helping some folks setting up an analytics cluster with  Spark.
>>>>>
>>>>> They want to use the HiveContext to enable the Window functions on
>>>>> DataFrames(*) but they don't have any Hive installation, nor they need one
>>>>> at the moment (if not necessary for this feature)
>>>>>
>>>>>
>>>>>
>>>>> When we try to create a Hive context, we get the following error:
>>>>>
>>>>>
>>>>>
>>>>> > val sqlContext = new
>>>>> org.apache.spark.sql.hive.HiveContext(sparkContext)
>>>>>
>>>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>>>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>>>
>>>>>        at
>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>>>
>>>>>
>>>>>
>>>>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>>>>  Hive Metastore?
>>>>>
>>>>>
>>>>>
>>>>> Is there  a way to instantiate a HiveContext for the sake of Window
>>>>> support without an underlying Hive deployment?
>>>>>
>>>>>
>>>>>
>>>>> The docs are explicit in saying that that is should be the case: [1]
>>>>>
>>>>>
>>>>>
>>>>> "To use a HiveContext, you do not need to have an existing Hive
>>>>> setup, and all of the data sources available to aSQLContext are still
>>>>> available. HiveContext is only packaged separately to avoid including
>>>>> all of Hive’s dependencies in the default Spark build."
>>>>>
>>>>>
>>>>>
>>>>> So what is the right way to address this issue? How to instantiate a
>>>>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>
>>>>>
>>>>> -Gerard.
>>>>>
>>>>>
>>>>>
>>>>> (*) The need for a HiveContext to use Window functions is pretty
>>>>> obscure. The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
>>>>> Could not resolve window function 'max'. Note that, using window functions
>>>>> currently requires a HiveContext;"
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>>>>
>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>

Re: HiveContext standalone => without a Hive metastore

Posted by Michael Armbrust <mi...@databricks.com>.

You can also just make sure that each user is using their own directory.  A
rough example can be found in TestHive.

Note: in Spark 2.0 there should be no need to use HiveContext unless you
need to talk to a metastore.

On Thu, May 26, 2016 at 1:36 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Well make sure than you set up a reasonable RDBMS as metastore. Ours is
> Oracle but you can get away with others. Check the supported list in
>
> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
> total 40
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
>
> you have few good ones in the list.  In general the base tables (without
> transactional support) are around 55  (Hive 2) and don't take much space
> (depending on the volume of tables). I attached a E-R diagram.
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 26 May 2016 at 19:09, Gerard Maas <ge...@gmail.com> wrote:
>
>> Thanks a lot for the advice!.
>>
>> I found out why the standalone hiveContext would not work:  it was trying
>> to deploy a derby db and the user had no rights to create the dir where
>> there db is stored:
>>
>> Caused by: java.sql.SQLException: Failed to create database
>> 'metastore_db', see the next exception for details.
>>
>>        at
>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>> Source)
>>
>>        at
>> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>> Source)
>>
>>        ... 129 more
>>
>> Caused by: java.sql.SQLException: Directory
>> /usr/share/spark-notebook/metastore_db cannot be created.
>>
>>
>> Now, the new issue is that we can't start more than 1 context at the same
>> time. I think we will need to setup a proper metastore.
>>
>>
>> -kind regards, Gerard.
>>
>>
>>
>>
>> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> To use HiveContext witch is basically an sql api within Spark without
>>> proper hive set up does not make sense. It is a super set of Spark
>>> SQLContext
>>>
>>> In addition simple things like registerTempTable may not work.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 26 May 2016 at 13:01, Silvio Fiorito <si...@granturing.com>
>>> wrote:
>>>
>>>> Hi Gerard,
>>>>
>>>>
>>>>
>>>> I’ve never had an issue using the HiveContext without a hive-site.xml
>>>> configured. However, one issue you may have is if multiple users are
>>>> starting the HiveContext from the same path, they’ll all be trying to store
>>>> the default Derby metastore in the same location. Also, if you want them to
>>>> be able to persist permanent table metadata for SparkSQL then you’ll want
>>>> to set up a true metastore.
>>>>
>>>>
>>>>
>>>> The other thing it could be is Hive dependency collisions from the
>>>> classpath, but that shouldn’t be an issue since you said it’s standalone
>>>> (not a Hadoop distro right?).
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Silvio
>>>>
>>>>
>>>>
>>>> *From: *Gerard Maas <ge...@gmail.com>
>>>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>>>> *To: *spark users <us...@spark.apache.org>
>>>> *Subject: *HiveContext standalone => without a Hive metastore
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I'm helping some folks setting up an analytics cluster with  Spark.
>>>>
>>>> They want to use the HiveContext to enable the Window functions on
>>>> DataFrames(*) but they don't have any Hive installation, nor they need one
>>>> at the moment (if not necessary for this feature)
>>>>
>>>>
>>>>
>>>> When we try to create a Hive context, we get the following error:
>>>>
>>>>
>>>>
>>>> > val sqlContext = new
>>>> org.apache.spark.sql.hive.HiveContext(sparkContext)
>>>>
>>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>>
>>>>        at
>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>>
>>>>
>>>>
>>>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>>>  Hive Metastore?
>>>>
>>>>
>>>>
>>>> Is there  a way to instantiate a HiveContext for the sake of Window
>>>> support without an underlying Hive deployment?
>>>>
>>>>
>>>>
>>>> The docs are explicit in saying that that is should be the case: [1]
>>>>
>>>>
>>>>
>>>> "To use a HiveContext, you do not need to have an existing Hive setup,
>>>> and all of the data sources available to aSQLContext are still
>>>> available. HiveContext is only packaged separately to avoid including
>>>> all of Hive’s dependencies in the default Spark build."
>>>>
>>>>
>>>>
>>>> So what is the right way to address this issue? How to instantiate a
>>>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>>
>>>> -Gerard.
>>>>
>>>>
>>>>
>>>> (*) The need for a HiveContext to use Window functions is pretty
>>>> obscure. The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
>>>> Could not resolve window function 'max'. Note that, using window functions
>>>> currently requires a HiveContext;"
>>>>
>>>>
>>>>
>>>> [1]
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>>>
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Re: HiveContext standalone => without a Hive metastore

Posted by Michael Segel <ms...@hotmail.com>.

Going from memory… Derby is/was Cloudscape which IBM acquired from Informix who bought the company way back when.  (Since IBM released it under Apache licensing, Sun Microsystems took it and created JavaDB…) 

I believe that there is a networking function so that you can either bring it up in stand alone mode or networking mode that allows simultaneous network connections (multi-user). 

If not you can always go MySQL.

HTH

> On May 26, 2016, at 1:36 PM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Well make sure than you set up a reasonable RDBMS as metastore. Ours is Oracle but you can get away with others. Check the supported list in
> 
> hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
> total 40
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
> drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
> drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle
> 
> you have few good ones in the list.  In general the base tables (without transactional support) are around 55  (Hive 2) and don't take much space (depending on the volume of tables). I attached a E-R diagram.
> 
> HTH
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 26 May 2016 at 19:09, Gerard Maas <gerard.maas@gmail.com <ma...@gmail.com>> wrote:
> Thanks a lot for the advice!. 
> 
> I found out why the standalone hiveContext would not work:  it was trying to deploy a derby db and the user had no rights to create the dir where there db is stored:
> 
> Caused by: java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
> 
>        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> 
>        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
> 
>        ... 129 more
> 
> Caused by: java.sql.SQLException: Directory /usr/share/spark-notebook/metastore_db cannot be created.
> 
> 
> 
> Now, the new issue is that we can't start more than 1 context at the same time. I think we will need to setup a proper metastore.
> 
> 
> 
> -kind regards, Gerard.
> 
> 
> 
> 
> 
> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
> To use HiveContext witch is basically an sql api within Spark without proper hive set up does not make sense. It is a super set of Spark SQLContext
> 
> In addition simple things like registerTempTable may not work.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fiorito@granturing.com <ma...@granturing.com>> wrote:
> Hi Gerard,
> 
>  
> 
> I’ve never had an issue using the HiveContext without a hive-site.xml configured. However, one issue you may have is if multiple users are starting the HiveContext from the same path, they’ll all be trying to store the default Derby metastore in the same location. Also, if you want them to be able to persist permanent table metadata for SparkSQL then you’ll want to set up a true metastore.
> 
>  
> 
> The other thing it could be is Hive dependency collisions from the classpath, but that shouldn’t be an issue since you said it’s standalone (not a Hadoop distro right?).
> 
>  
> 
> Thanks,
> 
> Silvio
> 
>  
> 
> From: Gerard Maas <gerard.maas@gmail.com <ma...@gmail.com>>
> Date: Thursday, May 26, 2016 at 5:28 AM
> To: spark users <user@spark.apache.org <ma...@spark.apache.org>>
> Subject: HiveContext standalone => without a Hive metastore
> 
>  
> 
> Hi,
> 
>  
> 
> I'm helping some folks setting up an analytics cluster with  Spark.
> 
> They want to use the HiveContext to enable the Window functions on DataFrames(*) but they don't have any Hive installation, nor they need one at the moment (if not necessary for this feature)
> 
>  
> 
> When we try to create a Hive context, we get the following error:
> 
>  
> 
> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
> 
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> 
>        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
> 
>  
> 
> Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive Metastore?
> 
>  
> 
> Is there  a way to instantiate a HiveContext for the sake of Window support without an underlying Hive deployment?
> 
>  
> 
> The docs are explicit in saying that that is should be the case: [1]
> 
>  
> 
> "To use a HiveContext, you do not need to have an existing Hive setup, and all of the data sources available to aSQLContext are still available. HiveContext is only packaged separately to avoid including all of Hive’s dependencies in the default Spark build."
> 
>  
> 
> So what is the right way to address this issue? How to instantiate a HiveContext with spark running on a HDFS cluster without Hive deployed?
> 
>  
> 
>  
> 
> Thanks a lot!
> 
>  
> 
> -Gerard.
> 
>  
> 
> (*) The need for a HiveContext to use Window functions is pretty obscure. The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException: Could not resolve window function 'max'. Note that, using window functions currently requires a HiveContext;"  
> 
>  
> 
> [1] http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started <http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started>
> 
> 
> <Hive2_base_tables.pdf>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org

Re: HiveContext standalone => without a Hive metastore

Posted by Mich Talebzadeh <mi...@gmail.com>.

Well make sure than you set up a reasonable RDBMS as metastore. Ours is
Oracle but you can get away with others. Check the supported list in

hduser@rhes564:: :/usr/lib/hive/scripts/metastore/upgrade> ltr
total 40
drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 postgres
drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mysql
drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 mssql
drwxr-xr-x 2 hduser hadoop 4096 Feb 21 23:48 derby
drwxr-xr-x 3 hduser hadoop 4096 May 20 18:44 oracle

you have few good ones in the list.  In general the base tables (without
transactional support) are around 55  (Hive 2) and don't take much space
(depending on the volume of tables). I attached a E-R diagram.

HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 May 2016 at 19:09, Gerard Maas <ge...@gmail.com> wrote:

> Thanks a lot for the advice!.
>
> I found out why the standalone hiveContext would not work:  it was trying
> to deploy a derby db and the user had no rights to create the dir where
> there db is stored:
>
> Caused by: java.sql.SQLException: Failed to create database
> 'metastore_db', see the next exception for details.
>
>        at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
> Source)
>
>        at
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
> Source)
>
>        ... 129 more
>
> Caused by: java.sql.SQLException: Directory
> /usr/share/spark-notebook/metastore_db cannot be created.
>
>
> Now, the new issue is that we can't start more than 1 context at the same
> time. I think we will need to setup a proper metastore.
>
>
> -kind regards, Gerard.
>
>
>
>
> On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> To use HiveContext witch is basically an sql api within Spark without
>> proper hive set up does not make sense. It is a super set of Spark
>> SQLContext
>>
>> In addition simple things like registerTempTable may not work.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 26 May 2016 at 13:01, Silvio Fiorito <si...@granturing.com>
>> wrote:
>>
>>> Hi Gerard,
>>>
>>>
>>>
>>> I’ve never had an issue using the HiveContext without a hive-site.xml
>>> configured. However, one issue you may have is if multiple users are
>>> starting the HiveContext from the same path, they’ll all be trying to store
>>> the default Derby metastore in the same location. Also, if you want them to
>>> be able to persist permanent table metadata for SparkSQL then you’ll want
>>> to set up a true metastore.
>>>
>>>
>>>
>>> The other thing it could be is Hive dependency collisions from the
>>> classpath, but that shouldn’t be an issue since you said it’s standalone
>>> (not a Hadoop distro right?).
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Silvio
>>>
>>>
>>>
>>> *From: *Gerard Maas <ge...@gmail.com>
>>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>>> *To: *spark users <us...@spark.apache.org>
>>> *Subject: *HiveContext standalone => without a Hive metastore
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I'm helping some folks setting up an analytics cluster with  Spark.
>>>
>>> They want to use the HiveContext to enable the Window functions on
>>> DataFrames(*) but they don't have any Hive installation, nor they need one
>>> at the moment (if not necessary for this feature)
>>>
>>>
>>>
>>> When we try to create a Hive context, we get the following error:
>>>
>>>
>>>
>>> > val sqlContext = new
>>> org.apache.spark.sql.hive.HiveContext(sparkContext)
>>>
>>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>>
>>>        at
>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>>
>>>
>>>
>>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>>  Hive Metastore?
>>>
>>>
>>>
>>> Is there  a way to instantiate a HiveContext for the sake of Window
>>> support without an underlying Hive deployment?
>>>
>>>
>>>
>>> The docs are explicit in saying that that is should be the case: [1]
>>>
>>>
>>>
>>> "To use a HiveContext, you do not need to have an existing Hive setup,
>>> and all of the data sources available to aSQLContext are still
>>> available. HiveContext is only packaged separately to avoid including
>>> all of Hive’s dependencies in the default Spark build."
>>>
>>>
>>>
>>> So what is the right way to address this issue? How to instantiate a
>>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>>
>>>
>>>
>>>
>>>
>>> Thanks a lot!
>>>
>>>
>>>
>>> -Gerard.
>>>
>>>
>>>
>>> (*) The need for a HiveContext to use Window functions is pretty
>>> obscure. The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
>>> Could not resolve window function 'max'. Note that, using window functions
>>> currently requires a HiveContext;"
>>>
>>>
>>>
>>> [1]
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>>
>>
>>
>

Re: HiveContext standalone => without a Hive metastore

Posted by Gerard Maas <ge...@gmail.com>.

Thanks a lot for the advice!.

I found out why the standalone hiveContext would not work:  it was trying
to deploy a derby db and the user had no rights to create the dir where
there db is stored:

Caused by: java.sql.SQLException: Failed to create database 'metastore_db',
see the next exception for details.

       at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
Source)

       at
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
Source)

       ... 129 more

Caused by: java.sql.SQLException: Directory
/usr/share/spark-notebook/metastore_db cannot be created.


Now, the new issue is that we can't start more than 1 context at the same
time. I think we will need to setup a proper metastore.


-kind regards, Gerard.




On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> To use HiveContext witch is basically an sql api within Spark without
> proper hive set up does not make sense. It is a super set of Spark
> SQLContext
>
> In addition simple things like registerTempTable may not work.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 26 May 2016 at 13:01, Silvio Fiorito <si...@granturing.com>
> wrote:
>
>> Hi Gerard,
>>
>>
>>
>> I’ve never had an issue using the HiveContext without a hive-site.xml
>> configured. However, one issue you may have is if multiple users are
>> starting the HiveContext from the same path, they’ll all be trying to store
>> the default Derby metastore in the same location. Also, if you want them to
>> be able to persist permanent table metadata for SparkSQL then you’ll want
>> to set up a true metastore.
>>
>>
>>
>> The other thing it could be is Hive dependency collisions from the
>> classpath, but that shouldn’t be an issue since you said it’s standalone
>> (not a Hadoop distro right?).
>>
>>
>>
>> Thanks,
>>
>> Silvio
>>
>>
>>
>> *From: *Gerard Maas <ge...@gmail.com>
>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>> *To: *spark users <us...@spark.apache.org>
>> *Subject: *HiveContext standalone => without a Hive metastore
>>
>>
>>
>> Hi,
>>
>>
>>
>> I'm helping some folks setting up an analytics cluster with  Spark.
>>
>> They want to use the HiveContext to enable the Window functions on
>> DataFrames(*) but they don't have any Hive installation, nor they need one
>> at the moment (if not necessary for this feature)
>>
>>
>>
>> When we try to create a Hive context, we get the following error:
>>
>>
>>
>> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
>>
>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>
>>        at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>
>>
>>
>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>  Hive Metastore?
>>
>>
>>
>> Is there  a way to instantiate a HiveContext for the sake of Window
>> support without an underlying Hive deployment?
>>
>>
>>
>> The docs are explicit in saying that that is should be the case: [1]
>>
>>
>>
>> "To use a HiveContext, you do not need to have an existing Hive setup,
>> and all of the data sources available to aSQLContext are still
>> available. HiveContext is only packaged separately to avoid including
>> all of Hive’s dependencies in the default Spark build."
>>
>>
>>
>> So what is the right way to address this issue? How to instantiate a
>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>
>>
>>
>>
>>
>> Thanks a lot!
>>
>>
>>
>> -Gerard.
>>
>>
>>
>> (*) The need for a HiveContext to use Window functions is pretty obscure.
>> The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
>> Could not resolve window function 'max'. Note that, using window functions
>> currently requires a HiveContext;"
>>
>>
>>
>> [1]
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>
>
>

Re: HiveContext standalone => without a Hive metastore

Posted by Mich Talebzadeh <mi...@gmail.com>.

To use HiveContext witch is basically an sql api within Spark without
proper hive set up does not make sense. It is a super set of Spark
SQLContext

In addition simple things like registerTempTable may not work.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 May 2016 at 13:01, Silvio Fiorito <si...@granturing.com>
wrote:

> Hi Gerard,
>
>
>
> I’ve never had an issue using the HiveContext without a hive-site.xml
> configured. However, one issue you may have is if multiple users are
> starting the HiveContext from the same path, they’ll all be trying to store
> the default Derby metastore in the same location. Also, if you want them to
> be able to persist permanent table metadata for SparkSQL then you’ll want
> to set up a true metastore.
>
>
>
> The other thing it could be is Hive dependency collisions from the
> classpath, but that shouldn’t be an issue since you said it’s standalone
> (not a Hadoop distro right?).
>
>
>
> Thanks,
>
> Silvio
>
>
>
> *From: *Gerard Maas <ge...@gmail.com>
> *Date: *Thursday, May 26, 2016 at 5:28 AM
> *To: *spark users <us...@spark.apache.org>
> *Subject: *HiveContext standalone => without a Hive metastore
>
>
>
> Hi,
>
>
>
> I'm helping some folks setting up an analytics cluster with  Spark.
>
> They want to use the HiveContext to enable the Window functions on
> DataFrames(*) but they don't have any Hive installation, nor they need one
> at the moment (if not necessary for this feature)
>
>
>
> When we try to create a Hive context, we get the following error:
>
>
>
> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
>
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
>        at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>
>
>
> Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive
> Metastore?
>
>
>
> Is there  a way to instantiate a HiveContext for the sake of Window
> support without an underlying Hive deployment?
>
>
>
> The docs are explicit in saying that that is should be the case: [1]
>
>
>
> "To use a HiveContext, you do not need to have an existing Hive setup,
> and all of the data sources available to aSQLContext are still available.
> HiveContext is only packaged separately to avoid including all of Hive’s
> dependencies in the default Spark build."
>
>
>
> So what is the right way to address this issue? How to instantiate a
> HiveContext with spark running on a HDFS cluster without Hive deployed?
>
>
>
>
>
> Thanks a lot!
>
>
>
> -Gerard.
>
>
>
> (*) The need for a HiveContext to use Window functions is pretty obscure.
> The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
> Could not resolve window function 'max'. Note that, using window functions
> currently requires a HiveContext;"
>
>
>
> [1]
> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>

Re: HiveContext standalone => without a Hive metastore

Posted by Silvio Fiorito <si...@granturing.com>.

Hi Gerard,

I’ve never had an issue using the HiveContext without a hive-site.xml configured. However, one issue you may have is if multiple users are starting the HiveContext from the same path, they’ll all be trying to store the default Derby metastore in the same location. Also, if you want them to be able to persist permanent table metadata for SparkSQL then you’ll want to set up a true metastore.

The other thing it could be is Hive dependency collisions from the classpath, but that shouldn’t be an issue since you said it’s standalone (not a Hadoop distro right?).

Thanks,
Silvio

From: Gerard Maas <ge...@gmail.com>
Date: Thursday, May 26, 2016 at 5:28 AM
To: spark users <us...@spark.apache.org>
Subject: HiveContext standalone => without a Hive metastore

Hi,

I'm helping some folks setting up an analytics cluster with  Spark.
They want to use the HiveContext to enable the Window functions on DataFrames(*) but they don't have any Hive installation, nor they need one at the moment (if not necessary for this feature)

When we try to create a Hive context, we get the following error:

> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
       at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)

Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive Metastore?

Is there  a way to instantiate a HiveContext for the sake of Window support without an underlying Hive deployment?

The docs are explicit in saying that that is should be the case: [1]

"To use a HiveContext, you do not need to have an existing Hive setup, and all of the data sources available to aSQLContext are still available. HiveContext is only packaged separately to avoid including all of Hive’s dependencies in the default Spark build."

So what is the right way to address this issue? How to instantiate a HiveContext with spark running on a HDFS cluster without Hive deployed?


Thanks a lot!

-Gerard.

(*) The need for a HiveContext to use Window functions is pretty obscure. The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException: Could not resolve window function 'max'. Note that, using window functions currently requires a HiveContext;"

[1] http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started

Re: HiveContext standalone => without a Hive metastore

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Gerald,

I am not sure the so called independence is will. I gather you want to use
HiveContext for your SQL queries and sqlContext only provides a subset of
HiveContext.

try this

  val sc = new SparkContext(conf)
 // Create sqlContext based on HiveContext
 val sqlContext = new HiveContext(sc)


However, ii will take 3 minutes to set up hive and all you need to add a
softlink from $SPARK_HOME/conf to hive-site.xml

hive-site.xml -> /usr/lib/hive/conf/hive-site.xml

The fact that it is not working shows that the statement in doc may not be
valid.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 May 2016 at 10:28, Gerard Maas <ge...@gmail.com> wrote:

> Hi,
>
> I'm helping some folks setting up an analytics cluster with  Spark.
> They want to use the HiveContext to enable the Window functions on
> DataFrames(*) but they don't have any Hive installation, nor they need one
> at the moment (if not necessary for this feature)
>
> When we try to create a Hive context, we get the following error:
>
> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
>
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
>        at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>
> Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive
> Metastore?
>
> Is there  a way to instantiate a HiveContext for the sake of Window
> support without an underlying Hive deployment?
>
> The docs are explicit in saying that that is should be the case: [1]
>
> "To use a HiveContext, you do not need to have an existing Hive setup,
> and all of the data sources available to aSQLContext are still available.
> HiveContext is only packaged separately to avoid including all of Hive’s
> dependencies in the default Spark build."
>
> So what is the right way to address this issue? How to instantiate a
> HiveContext with spark running on a HDFS cluster without Hive deployed?
>
>
> Thanks a lot!
>
> -Gerard.
>
> (*) The need for a HiveContext to use Window functions is pretty obscure.
> The only documentation of this seems to be a runtime exception: "
> org.apache.spark.sql.AnalysisException: Could not resolve window function
> 'max'. Note that, using window functions currently requires a HiveContext;"
>
>
> [1]
> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>